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ABSTRACT 


This  thesis  develops  analytic  models  for  estimating  the  amount  of  memory 
interference  in  multiprocessor  systems,  in  which  n  processors  access  m  memories 
independently. 

The  processors  are  characterized  by  a  typical  processing  time  per  memory 
access  and  the  memories  by  an  access  time(ta)  and  rewrite  tirre(tw).  Processor 
behavior  is  simplified  to  an  ordered  sequence  of  a  memory  request  followed  by  a 
certain  amount  of  processing.  The  predominant  technique  used  involves  discrete 
time  Markov  chain  models.  Some  simple  exponential  server  models  as  well  as 
several  approximate  models  are  also  presented.  Simulation  is  used  to  evaluate 
the  accuracy  of  the  approximate  models.  Some  empirical  measurements  of  the 
PDP-11/20  are  used  to  estimate  the  parameters  of  a  model,  that  is  used  to 
predict  the  performance  of  C.mmp,  Carnegie-Mellow'  University’s  multiprocessor, 
computer,  which  will  include  upto  16  PDP-11  processors. 

The  models  can  be  partitioned  into  three  broad  classes  :  tp=tw,  tp>tw  and 
tp<tw,  where  tp  denotes  the  average  processing  time.  Systems  with  tp=tw  are 
described  first  because  they  represent  boundary  conditions  for  the  other  two 
cases.  Different  modeling  techniques  are  examined  for  tp=tw  and  a  reasonable 


li 


approximation  is  proposed.  An  important  result  observed  is  the  absence  of  a  law 
of  diminishing  returns.  The  performance  of  a  multiprocessor  system  with  n 
processors  and  n  memories  continues  to  rise  at  a  constant  rate  as  n  increases.  A 
simple  exponential  server  model  showed  this  rate  to  be  0.5;  a  constant 
processing  time  model  predicted  a  slope  of  0.SP6  for  the  average  number  of  busy 
Mp  s.  The  exponential  server  model  gives  the  average  number  of  busy  Mp’s  as 
nkm/(n+m-l).  An  approximate  result  for  constant  processing  times  gives  the 
average  number  of  busy  Mp’s  as  »*/ jj,  where  i-max(n,m)  ard  j“min(n,m). 
An  intuitively  obvious  conclusion  limits  the  maximum  number  of  active  Pc’s  by 
min(n,m). 

Markov  chain  models  are  also  developed  for  systems  with  tp>tw.  A  new  model 
for  geometrically  distributed  prc:essing  time  is  developed.  A  different  analytic 
approach  is  used  to  model  systems  with  private  caches  for  the  processors.  In 
general,  since  the  Pc  is  slow,  it  takes  fewer  memory  units  for  the  performance 
to  exhibit  a  saturation  effect.  In  the  absence  of  memory  contention  (which,  is 
now  possible  even  for  m<n)  the  maximum  memory  access  ra\e{MpAR)  is  n/(ta+tp). 

With  tp<tw,  since  the  processor  is  fast  performance  improvement  is  obtained 
for  m>n.  If  m**n,  these  systems  do  not  yield  significant  improvement  over  systems 
with  tp=tw.  In  general,  adding  an  extra  memory  improves  the  performance  more 
than  adding  an  extra  processor.  The  maximum  average  MpAR  is  the  minimum  of  m/tc 
and  n/(ia+tp},  the  maximum  is  achieved  if  the  processors  do  not  interfere.  Note 
that  since  the  Pc  is  very  fast,  it  can  make  a  request  to  the  memory  that  served 
it  last  before  the  rewrite  cycle  is  over.  In  this  case,  the  Pc  has  to  wait  even 
though  no  other  Pc  is  being  serviced  by  the  memory  module. 
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CHAPTER  1 
INTRODUCTION 


In  the  design  of  new  computer  systems  there  exists  an  enormous  number  of 
alternative  decisions.  In  this  thesis  the  major  design  parameters  that  will  be 
allowed  to  vary  are  the  number  of  processors(Pc’s)t  and  memories(Mp’s)  and  their 
relative  speeds.  The  quantitative  approach  to  performance  evaluation  consists  of 
three  major  phases  [GrenU72): 

(i)  Collection  of  data:  this  phase  involves  the  planning  and  conducting 
of  the  experiment  for  data  collection  as  well  as  techniques  for 
measurement. 

(ii)  Analysis  of  data:  this  phase  consists  of  construction  of  models  and 
estimation  of  parameters  in  the  models  as  well  as  validation  of  the  models. 

(iii)  Interpretation  of  data:  this  phase  concerns  the  summarization  of  the 
results  and  new  insights  gained  in  the  study  as  well  as  making  decisions 
based  on  the  results. 

The  emphasis  of  the  application  of  quantitative  methods  is  on  the  convergence  of 
two  aspects.  The  first  aspect  is  the  reliance  on  data  either  from 
experimentation  on  the  real  system  or  from  simulation.  The  second  is  the  use  of 


tWe  use  the  PMS  notation  of  Bell  and  Newell  [BellC71a]  in  this  thesis  to 
describe  hardware  organization. 
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mathematical  models.  It  is  easy  to  collect  massive  amounts  of  confusing  data 
unless  one  has  some  model.  On  the  other  hand,  a  model  without  empirical 
validation  is  at  best  an  intellectual  exercise. 

Since  no  performance  measurements  of  the  actual  system  can  be  made  until  it 
has  been  designed,  implemented,  and  then  observed  over  a  long  period  of  time,  it 
becomes  necessary  to  use  analytic  and  simulation  models.  Analytic  models  enable 
the  designer  to  explore  a  large  design  space  quickly  and  rather  economically.. 
However,  modeling  is  not  an  easy  task  and  it  is  often  necessary  to  simplify  the 
model  to  make  it  amenable  to  mathematical  analysis,  remembering  that  any 
mathematical  model  is  only  an  approximation  of  real-life  events.  If  the  system 
is  too  complex  to  allow  a  complete  analytic  study,  the  system  behavior  can  be 
modeled  at  various  levels  of  abstraction  in  a  hierarchical  fashion. 

Simulation  offers  an  different  approach:  probabilistic  emulation  of  a 
mathematical  model  that  portrays  the  aggregate  bahavior  of  the  real  system.  We 
gain  in  realism  since  we  are  no  longer  forced  to  impose  assumptions  for 
analytical  convenience.  However,  simulations  tend  to  be  expensive  if  a  high 
degree  of  realism  on  a  detailed  level  is  required.  Due  to  the  stochastic  nature 
of  simulation  results,  their  precision  can  be  measured  by  the  standard  deviation 
or  confidence  intervals  of  the  estimates  obtained.  The  confidence  interval  gets 
tighter  as  the  size  of  the  experiment  is  increased.  The  standard  deviation  is 
proportional  to  the  square  root  of  the  length  of  the  run.  Hence,  simulation 
studies  are  most  valuable  when  focused  on  a  small  set  of  design  alternatives 
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Figure  1.1  A  simple  block  diagram  of  a  multiprocessor  system. 
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selected  by  analytic  studies.  Also,  if  the  analytic  techniques  are 
computationally  expensive,  simulation  might  well  be  more  economical.  Moreover, 
it  is  easier  to  change  the  mathematical  model  in  a  simulation  experiment. 
Another  important  use  for  analytic  models  is  as  a  control  variable  for  improving 
the  efficiency  of  simulation  experiments  by  reducing  the  variance  of  parameter 
estimates  from  simulation  experiments  [GaveD71]. 

Mathematical  models  of  computer  systems  can  be  developed  at  various  levels 
of  abstraction.  A  large  number  of  models  for  time-sharing  systems  consider  a  job 
as  a  basic  unit[MckiJ69],  and  in  many  models  of  multiprogrammed  computer  systems 
the  block  of  instructions  between  I/O  operations  is  taken  as  a  basic 
unit[BuzeJ71;  GaveD67],  However,  in  this  study  a  much  more  detailed  model  is 
used  to  analyze  interference  as  processors  access  individual  words  from  the 
memory  modules.  Each  processor’s  performance  is  measured  by  the  number  of  memory 
accesses  per  unit  time.  In  a  multiprocessor  system  the  performance  of  each  Pc  is 
not  independent  of  the  behavior  of  the  other  Pc’s.  (■  simple  block  diagram  of  a 
multiprocessor  system  is  shown  in  Fig.  1.1.  The  connecting  network  or  switch 
provides  a  path  form  each  of  the  n  Pc’s  to  each  of  the  m  Mp’s,  such  that  a 
connection  between  Pc[i]  and  Mp[j]  does  not  hamper  a  connection  between  Pc[k] 
and  Mp[l],  where  iA  and  jA  The  processors  contend  with  each  other  for  memory 
service.  This  contention  is  referred  to  as  memory  interference.  This  thesis 
presents  a  set  of  techniques  for  determining  the  extent  of  memory  interference 
as  measured  by  the  average  number  of  busy  memories  or  the  rate  at  which  the  Mp 
is  accessed. 
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Legend; 

1  instruction  fetch 

2  instruction  decoding 

3  operand  fetch 

4  instruction  execution 

5  next  instruction  fetch 


ta  memory  access  time 
tw  memory  restore  time 
td  instruction  decode  time 
tei  processor  execution  time 


figure  l*2a:  An  example  of  the  timing  of  a  typical  instruction. 
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Figure  1.2b;  Simplified  processor  behavior  t  unit  instruction 
Two  units  model  the  instruction  shown  in  Fig.  2.1a. 


Page  6 

Chapter  l  :  Introduction 
LI  General  Modeling  Assumptions 


1.1  GENERAL  MODELING  ASSUMPTIONS 


Due  to  the  complexity  of  the  problem,  the  exact  detailed  behavior  of  memory 
interference  in  a  multiprocessor  system  is  difficult  to  model.  Some  of  the 
parameters  that  characterize  the  behavior  of  a  Pc  are: 

(i)  Instruction  mix  :  Instructions  can  be  characterized  by  their  relative 

frequency.  In  general,  processor  behavior  varies  for  different 
instructions.  However,  in  this  thesis  differences  in  instructions  are  not 
modeled  explicitly.  Processor  behavior  is  modeled  as  an  ordered  sequence, 
consisting  of  a  memory  request  followed  by  a  certain  amount  of  execution 
time.  At  this  level  of  abstraction  no  distinction  is  made  between  the 
processing  needed  to  decode  an  instruction  and  the  processing 

corresponding  to  its  execution.  Thus,  the  processing  time  characterizing 

a  Pc  depicts  only  the  aggregate  behavior  of  the  real  Pc.  Figure  1.2 
depicts  the  actual  and  abstracted  behaviors.  A  typical  un't  instruction t 
is  shown  in  Fig.  1.2b. 

(ii)  Probability  distribution  of  the  processing  time:  Instructions  are 

characterized  by  their  processing  time.  Typical  programs  are  measured  to 
find  the  probability  distribution  of  the  instruction  processing  time. 

(ni)  Average  processing  time:  This  is  obtained  from  measurements  similar 
to  those  used  for  determining  the  probability  distribution. 

+The  concept  of  an  unit  instruction  was  first  proposed  by  Strecker[$treW70]. 
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(iv)  Access  pattern  of  a  Pc:  This  is  the  trace  of  the  pages  or  memory 
'ocations  accessed  by  the  Pc.  In  this  study  serial  correlation  between 
successive  memory  accesses  will  be  ignored;  not  a  very  serious  assumption 
since  data  and  instruction  references  are  intermingled.  Demand  patterns 
will  be  modeled  as  sequences  of  Bernoulli  trials.  Memory  accesses  will  be 
characterized  by  the  memory  unit  to  which  they  are  addressed.  Let  pjj 
denote  the  probability  that  the  i-th  processor  requests  service  from  the 
j-th  memory  unit.  Thus,  the  demand  pattern  of  each  processor  is 
equivalent  to  a  sequence  of  Bernoulli  trials.  Unless  otherwise  specified, 
Pij  will  be  assumed  to  be  equal  to  1/m,  where  m  is  the  number  of  Mp’s. 

The  effect  of  I/O  activity  will  not  be  modeled  explicitly. 
Strecker  [StreW70]  has  shown  that  if  the  rate  of  I/O  requests  is  RIO,  then  a 
fraction  RIO/(m*tc)  of  the  memory  access  rate  an  be  apportioned  to  I/O. 

A  processor  is  said  to  be  queued  if  it  is  waiting  for  or  in  the  proce's  of 
receiving  memory  service.  A  processor  is  said  to  be  active  if  it  is  currently 
being  serviced  by  a  memory.  Likewise,  a  memory  is  said  to  be  occupied  or  busy  if 
there  is  at  least  one  processor  queued  for  that  memory  unit. 

Primary  memory  behavior  is  a  function  of  the  fabrication  technology,  i.e. 
core  or  semiconductor.  Memory  performance  can  be  characterized  by  the  access 
time  (ta),  rewrite  time  (tw),  and  cycle  time  (tc).  Nominally,  the  cycle  time  is 
the  sum  of  the  other  two.  In  this  study,  no  distinction  is  made  between  read  and 
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Figure  1.3  Structure  of  the  queueing  model 
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write  operations.  The  effect  of  interleaving  within  a  Mp  module  is  to  make  the 

access  and  cycle  times  seen  by  a  Pc  variable.  Most  of  the  models  in  this  thesis 
will  use  the  average  values. 


The  processing  time  shown  in  Fig.  1.2  is  the  effective  processing  time 
measured  from  the  time  when  the  Pc  gets  the  data  from  its  last  memory  access  to 
the  time  when  the  next  memory  request  reaches  the  memory.  Thus,  the  delay 
associated  with  address  mapping  and  communication  protocol  needed  to  make  a 
request  are  attributed  to  the  Pc.  The  memory  access  time  includes  the  time 
required  to  set  up  the  switch  for  the  data  transfer.  If  the  memory  control  unit 
introduces  some  delay,  then  that  is  also  added  to  the  access  time.  A  more 
detailed  description  is  presented  in  Chapter  5. 


1.2  MODELING  CONCEPTS 

A  queueing  model  will  be  used  to  analyze  memory  interference.  Figure  1.3 
shows  the  basic  structure  of  the  model.  All  time  delays  are  modeled  as  service 
centers  and  there  is  one  job  in  the  queueing  system  for  every  Pc.  In  real 
systems  the  Pc  can  start  execution  while  the  Mp  is  in  its  rewrite  cycle.  If 
tp>tw  the  service  time  of  the  memory  will  be  assumed  to  be  tc,  and  the  effective 
service  time  of  the  Pc  will  be  assumed  to  be  tp-tw.  This  sii.'Dlification  allows 
the  Mp  to  start  serving  the  next  job  as  soon  as  the  last  job  has  left.  The 
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overall  system  behavior  is  unaltered  by  this  simplification.  Howe  *er,  when  tp<tw 
the  queueing  model  cannot  be  used  for  reasons  explained  in  detail  in  Chapter  4. 

The  number  of  Pc's  will  be  denoted  by  n  and  the  number  of  Mp’s  by  m.  The 
multiprocessor  system  will  be  referred  to  as  a  nxm  system.  The  state  of  the 
system  will  be  denoted  by  a  vector  describing  the  sizes  of  the  various  queues. 
The  major  technique  used  in  this  thesis  involves  Markov  chains[ParzE62].  A  brief 
review  of  some  of  the  definitions  and  concepts  is  presented  here. 

A  stochastic  process  is  a  family  of  random  variables  X(t),  t<T  indexed  by  a 

parameter  t  v.*->,,ig  in  an  index  set  T.  The  stochastic  process  is  a  discrete 

parameter  process  if  T«{0,1,2,...}  or  {0,±1,±2,. The  process  is  a  continuous 
parameter  process  if  T-(t>0}  or  {-oo<t<oo}. 

A  discrete  parameter  stochastic  process  {X(t),  t“0,i,2,...}  or  a  continuous 
parameter  stochastic  process  {X(t),  t>0}  is  said  to  be  a  Markov  process  if,  for 
any  set  of  n  points  t,<t?<...<tn  in  the  index  set  of  the  process,  the 

conditional  distribution  of  X(tn),  for  given  values  of  X(t  i),...,X(tn_ ,), 

depends  only  on  X(tn_ ,),  the  most  recent  known  value;  more  precisely,  for  any 
numbers  x . . 

P[X(tn)<xn  |  X(tl)=x„...,X(tn.l)«xn.l]  «  P[X(tn)<xn  f  X(tn.,)-xn.,] 

Intuitively,  this  means  that,  given  the  present  of  the  process,  the  future  is 
independent  of  its  past.  The  set  of  possible  values  of  a  stochastic  process  is 
called  its  state  space.  The  state  space  is  called  discrete  if  it  contains  a 
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finite  or  countably  infinite  number  'A  states.  A  state  space  which  is  e.ot 
discrete  is  called  continuous.  A  Markov  process  whose  state  space  is  discrete  is 
called  a  Markov  chain. 

A  Markov  process  is  described  by  a  transition  probability  f unction ,  denoted 
by  P(E,t  |  x,t0),  which  represents  the  conditional  probability  that  the  state  of 

the  system  will  at  time  t  belong  to  the  set  E,  given  that  at  time  t0(<t)  the 

system  is  in  state  x.  The  Markov  process  is  said  to  have  stationary  transition 
probabilities,  or  to  be  homogeneous  in  time,  if  P(E,t  I  x,t0)  depends  on  t  and 
t0  only  through  the  difference  (t-tc). 

A  Markov  chain  is  irreducible  if  every  state  can  be  reached  from  every 
other  state  not  necessarily  in  one  step.  The  period  of  a  state  i  is  defined  as 
the  greatest  common  divisor  of  all  integers  k  such  that  the  probability  of 

returning  to  state  i  in  k  steps  is  greater  than  0.  A  state  of  an  irreducible 

Markov  chain  is  aperiodic  if  it  has  period  1.  A  Markov  chain  is  aperiodic  if 
every  state  in  its  state  space  is  aperiodic. 

A  discrete  parameter  irreducible  aperiodic  Markov  chain  that  has  stationary 
transition  probabilities  possesses  a  stationary  state  probability  distribution. 
Let  Z(k)  denote  the  steady  state  probability  of  state  k.  Then, 

Z(k)  =  E  TRANS(k,j)*Z(j) 

0 

where  TRANS(k,j)  is  the  one  step  transition 
probability  from  state  j  to  state  k. 

Transition  probabilities  from  a  current  state  to  a  next  state  will  be 
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evaluated  for  the  irreducible  aperiodic  discrete  Markov  chain  models  in  the 
forthcoming  chapters.  The  steady  state  probabilities  will  be  used  to  calculate 
the  average  number  of  busy  Mp’s  which  is  equal  to  the  number  of  unit 
instructions  executed  in  one  memory  cycle.  The  unit  imtructicn  execution  rate 
(UKR)  or  memory  ucccss  rate  (Mp/IR)  is  obtained  by  dividing  the  average  number  of 
busy  Mp’s  by  the  cycle  time. 

1.3  EXTANT  MULTIPROCESSOR  SYSTEMS 

A  group  at  Carnegie-Mellon  University  is  currently  in  the  process  of 
constructing  a  multiprocessor  computer  system  (C.mmp)  that  will  have  up  to 
sixteen  central  processors  (PDP-1 1/20’s)  charing  the  same  physical  address  space 
[BellC71b;  WulfW72]  and  concern  has  been  expressed  about  the  performance  of  such 
a  system  with  this  many  active  processors.  The  models  developed  in  this  thesis 
will  be  used  to  predict  the  performance  of  C.mmp  in  Chapter  5.  Figure  1.4 
illustrates  the  major  components  of  a  multiprocessor  such  as  C.mmp.  In  addition 
to  the  processors,  there  is  a  set  of  memory  modules  that  are  able  to  operate 
independently;  little  would  be  gained  if  all  the  processors  had  to  wait  for 
service  from  a  single  memory  module.  Thus,  between  the  processors  and  the  memory 
modules  (Mp’s)  is  an  n  by  m  crosspoint  switch,  which  allows  any  Pc  to  access  any 
Mp.  There  are  a  number  of  ways  of  implementing  the  switch;  Fig.  1.5(a)  depicts 
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where:  Pc/cenlral  processor;  Mp/primary  memory;  l/terminal -- ; 

Ks/slov;  device  control  (e.g.,  for  Teletype); 

Kf/fast  device  concrol  (e.g.,  for  disk); 

Kc/control  for  deck,  timer,  interprocessor  common ication 

Dmap/relocation  registers  for  mapping  Pc  address  into  Mp 
address  space. 

Both  switches  have  static  configuration  control  by  manual  and 
program  control 

Fig.  I.4  Proposed  CMU  multiminiprocessor  computcr/C.mmp. 
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an  n  by  m  crosspoint  switch,  and  Fig.  1.5(b)  illustrates  the  use  of  trunk  lines; 
combmaiions  of  these  two  basic  schemes  can  y  »ld  many  other  other  schemes. 
Other  multiprocessors,  although  limited  to  a  small  number  of  Pc’s,  i.e.  two  to 
four,  also  basically  use  a  crosspoint  switch,  e.g.  the  Burroughs  D825[AndeJ62] 
and  Univac  1110.  For  further  discussion  of  trunk  lines,  and  a  variety  of  other 
switching  structures,  the  reader  is  referred  to  Bell  and  Newell  [BellC71a], 


1.4  COMMENTS  ON  EARLIER  WORK 

A  review  of  current  literature  shows  very  few  models  of  memory 
interference.  Skinner  and  Asher  [SkinC69]  proposed  a  discrete  Markov  chain  model 
for  multiprocessor  systems  with  tp=tw.  The  analysis  was  presented  for  a  small 
number  of  Pc’s(<2).  However,  for  larger  systems  the  complexity  of  the  problem 
deterred  the  authors  from  further  pursuit  of  an  analytic  solution. 

S trecker[StreW70]  developed  a  set  of  simple  approximate  models.  Most  of  his 
modeling  assumptions  are  similar  to  those  used  in  this  thesis  While  the 
analysis  o*  Skinner  and  Asher  is  rigorous  and  exact,  Strecker’s  analysis  is 
approximate.  In  this  thesis,  an  exact  analysis  of  a  discrete  Markov  chain  model 
fnr  -vstpms  with  tp=tw  is  presented.  As  expected,  the  exact  analysis  is  very 
complex.  However,  the  results  of  the  exact  analysis  suggest  more  reasonable 
approximations  that  yield  performance  estimates  that  are  more  accurate  than 
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Stackers.  For  instance,  the  exact  discrete  Markov  chain  model  described  in 
Chapter  2  shows  that  the  Mp/1R  for  a  jxk  and  a  kxj  multiprocessor  system  with 
tp-tw  is  almost  equal,  a  result  not  apparent  from  Strecker’s  work.  Also, 
Strecker’s  formula  for  tp=tw  is  more  accurate  for  m>n  than  for  m<n;  n  is  the 
number  of  Pc’s  and  m  the  number  of  Mp’s. 

More  detailed  descriptions  of  the  works  of  Skinner  and  Asher,  and  Strecker 
can  be  found  in  Chapters  2,  3  and  4.  Bhatia[BhatS72]  has  shown  how  the  results 
from  memory  interference  models  can  be  used  as  data  for  models  of  timeshared 
multiprocessor  systems  at  the  user  program  level. 

A  major  contribution  of  this  thesis  is  a  systematic  approach  to  the  use  of 
the  Markov  chain  technique  for  analyzing  memory  interference  in  multiprocessor 
systems.  The  exact  analysis  of  the  Markov  chain  is  complex.  However,  the 
behavior  observed  from  the  exact  analysis  is  used  to  examine  an  approximate 
solut.on  technique  that  is  computationally  simpler.  Though  some  of  the  models 
presented  in  this  thesis  may  be  only  marginally  more  accoi  ate(52)  than 
Strecker’s  results,  they  may  result  in  much  more  accurate  estimates  when  used  as 
mputs  to  other  models  such  as  Bhatia’s  model  for  time-shared  systems.  For 
example,  the  waiting  time  for  a  single  server  queueing  system  with  Poisson  input 
rate  X  and  exponential  service  at  rate  u  is  l/(u-X).  A  57.  error  in  the  value  of 
u  can  cause  a  greater  error  in  the  estimated  waiting  time  if  \  is  close  to  u. 
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The  analytic  models  are  described  in  detail  in  Chapters  2,  3  and  4  The 
models  are  mathematical  abstractions  of  the  real  systems.  Thus,  they  do  not 
exactly  reflect  the  true  behavior  of  the  physical  system.  However,  the  models 
will  be  referred  to  as  exact  or  approximate  depending  on  the  quality  of  the 
technique  used  to  analyze  the  mathematical  model. 

The  models  can  be  grouped  into  three  broad  classes  :  tp-tw,  tp>t\v  and 
tp  -tw,  where  tp  denotes  the  average  processing  time.  Systems  witn  tp-tw  are 
described  first  because  they  represent  boundary  conditions  for  the  other  two 
cases.  Different  modeling  techniques  are  examined  for  tp-tw  and  a  reasonable 
approximation  is  proposed.  A  casual  reader  may  find  it  useful  to  glance  through 
the  empirical  results  and  validation  of  the  models  presented  in  Chapter  5  and 
the  concluding  remarks  summarized  in  Chapter  6  before  examining  the  mathematical 
intricacies  of  Chapters  2,  3  and  A.  A  more  detailed  summary  of  the  thesis 

content  is  given  below.  Table  1.1  summarizes  the  salient  characteristics  of  the 
various  analytic  models. 

Chapter  2  is  devoted  to  multiprocessor  systems  with  tp-tw.  A  simple 
exponential  server  model  provides  some  insight  into  the  effect  of  adding  a 
processor  or  a  memory  to  the  system.  A  more  elaborate  analysis  for  constant 
processing  time  uses  discrete  Markov  chain  techniques;  an  exact  but  unwieldy 
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Salient  Character  1st Ics  of  Various  Analytic  Models 


Model  Descriptor 

Processing 

Time 

Memory  Cycle 
Time 

Remarks 

MULTIPROCESSOR  SYSTEMS  UITH  tp.tw 

Continuous  Time  Markov  Chain 

exponent lal 

exponent  la  1 

Jackson’s  Formulae  are  used  to 
obtain  a  simple  closed  form  solution 

Lxact  Discrete  Markov  Chain 

constant 

constant 

The  solution  is  clgor I thmic. 

Unwieldy  for  large  systems. 

Approx.  Discrete  Markov  Chain 

constant 

constant 

Approximation:  non-active  Pc’s  are 
reassigned  to  busy  Mp’s  at  the  end 
of  the  cycle.  Good  for  nSm. 

Strecker’s  Approximation 

constant 

constant 

Simple  domed  form  solution.  Less 
accurate  than  above.  Non-active  Pc’s 
are  reassigned  to  all  tip’s. 

Skinner  and  Asher’s 

Discrete  Markov  Chain 

constant 

constant 

Exact  discrete  Markov  chain  analysis 
for  upto  2  Pc’s  and  m  tip’s. 

Approximate  Model  for 

Arbitrary 

constant 

constant 

P,^  is  not  restricted  to  1/m. 

Solution  Is  simple  but  approximate. 

MULTIPROCESSOR  SYSTEMS  UITH  tp>tw 

• 

Discrete  Markov  Chain 
for  tpetu+tc 

constant 

constant 

Approximate  analysis  is  .presented. 
Queued  Pc’s  are  reassigned  to  all 

Mp’s  at  the  end  of  the  cycle. 

Discrete  Markov  Chain  for 
Geometrically  Distributed  tp 

geometric 

constant 

Approximate  analysis. 

Prob [tp«tH+l*tcI  ■  |)eal 

McCredie’s  Exponential  Server 

exponential 

exponential 

Jackson’s  formulae  are  used.  One 

Mp  has  different  tc  and  different 
access  probability.  Cache. 

Strecker’s  Analysis 

constant 

cons  tant 

Little’s  Formula  Is  used. 

Approximate  Model  for 

Systems  wi th  cache 

constant 

constant 

Little’s  Formula  is  used. 

Cache  memory  speed  Is  a  parameter. 

MULTIPROCESSOR  SYSTEMS  UITH  tp<tw 

• 

An  Approximate  Model 

constant 

a 

constant 

Model  for  tp*tn  Is  used  to  obtain 
the  conditional  probability  of  Pc’s 
second  request  going  to  an  Idle  Mp, 
depending  on  the  number  of  busy  Mp’s 

Strecker’s  Approximation 

constant 

cons  tant 

Results  of  tpstu  are  usedto  find 

the  probabilty  of  request  to  an 
Idle  Mp.  Average  number  of  busy 
Mp’s  )■  used. 
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analysis  and  a  simple  approximate  analysis  is  presented.  This  exact  analysis  of 
the  Markov  chain  model  is  compared  with  Strecker’s  approximation.  A  new 
approximate  model  is  introduced  to  analyze  the  effect  of  skewing  the  access 
patterns  of  the  processors  so  that  each  has  a  greater  preference  for  a  different 
memory  module. 

Chapter  3  presents  discrete  Markov  chain  models  for  tp>tw.  Techniques  for 
an  exact  analysis  of  the  models  are  introduced  and  some  approximations 
suggested.  Models  are  developed  for  constant  processing  time.  McCredie’s 
exponential  server  model[McCr73]  and  Strecker’s  approximate  mode!  for  constant 
tp  are  discussed.  Two  new  models  for  analyzing  the  effect  of  cache  memories  are 
also  described. 

Chapter  4  contains  an  approximate  model  for  tp<tw  and  compares  the  results 
with  Strecker’s  model. 

Chapter  5  contains  the  results  of  some  empirical  measurements  of  POP- 1 1 
programs.  The  processing  time  distribution  is  evaluated  from  these  measurements. 
The  process  of  extracting  the  abstract  model  parameters  from  the  real  physical 
system  behavior  is  demonstrated.  Predictions  are  made  about  the  performance  of 
Carnegie-Mellon  University’s  C.mmp  and  compared  with  some  actual  measurements. 
This  preliminary  comparison  with  actual  measurements  shows  the  accuracy  and 
utility  of  analytic  models. 

Chapter  6  summarizes  the  salient  results  developed  in  the  thesis.  An 
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example  of  the  use  of  these  models  for  examining  design  alternatives  is  also 
included.  Some  directions  for  future  work  are  discussed. 


CHAPTER  2 


MULTIPROCESSOR  SYSTEMS  WITH  TP-TW 


In  this  chapter  multiprocessor  systems  with  tp**tw  will  be  analysed.  This 
could  happen  even  with  a  very  fast  Pc.  If  the  system  is  bus-bound  and  the  Pc-Mp 
bus  recovers  at  the  same  time  that  the  memory  is  ready  to  service  the  next 
request,  then  the  effective  processing  time  (as  seen  by  the  memory)  is  equal  to 
the  memory  rewrite  time.  With  tp-tw,  the  analysis  is  simpler  than  with  tp<tw  and 
tp>tw.  Also,  it  is  a  boundary  condition  for  the  other  two  cases.  Thus,  tp*tw  is 
an  interesting  case  for  a  preliminary  comparison  of  various  modeling  techniques, 
even  when  tp  is  not  equal  to  tw  in  reality. 

A  simple  exponential  server  model  provides  some  insight  into  the  effect  of 
adding  a  pr'cessor  or  a  memory  to  the  system.  A  more  elaborate  analysis  for 
constant  processing  time  uses  discrete  Markov  chain  techniques;  an  exact  but 
unwieldy  analysis  and  a  simple  approximate  analysis  is  presented.  This  exact 
analysis  of  the  Markov  chain  model  is  compared  with  Strecker’s  approximation. 
The  results  of  the  exact  analysis  are  used  to  improve  the  accuracy  of  the 
approximate  analysis.  A  new  approximate  model  is  introduced  to  analyze  the 
effect  of  skewing  the  access  patterns  of  the  processors  so  that  each  has  a 
greater  preference  for  a  different  memory  module.  A  diffusion  approximate.,  is 


also  considered. 


Cl, 


2.1 
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2.1  CONTINUOUS  TIME  MARKOV  CHAIN  MODEL 


In  our  first  model,  we  apply  the  classic  simplifying  assumption  in  queueing 

models,  we  model  the  service  lime,  or  cycle  time,  of  the  memory  modules  as 

exponent, ally  d,stnbuted  random  variables  [cl.  WagnH69).  Clearly  most  memory 

systems  do  not  have  an  exponentially  distributed  cycle  time.  However,  techniques 

such  as  interleaving,  cache  memories,  and  the  type  ol  memory  accesstread,  write, 

read-modify-wri.e)  suggest  that  this  exponential  assumption  may  be  as  good  an 

approximation  as  the  assumption  that  the  memory  cycle  time  is  fixed,  and  not 

variable  at  all.  Without  further  assumptions  or  approximations,  we  can  use  the 

results  of  Jackson  [JackJ63],  and  Gordon  and  Newell  [GordW67]  to  find  the 

performance  ol  the  multiprocessor  system.  This  technique  is  also  used  by 

McCredie  [McCrJ73]  for  multiprocessors  with  tp>tw.  This  exponential  server  model 

is  the  simplest  model  to  analyze.  „  also  gives  some  basic  insight  into  the 

extent  ol  memory  interference  when  the  system  has  a  large  number  ol  Pc's  and 
Mp’s. 

Let  the  number  ot  service  centers  be  m.  The  states  of  the  system  are 
m  dimensional  vectors  with  non-negative  integer  components,  the  )-th  component 
representing  the  queue  length  at  center  j.  It  K.(k„k. |K.)  is  a  slate 

vector,  then  let  SW-fk,.  Transition  trom  one  center  to  another  is 
characterized  by  a  routing  probability  i.e.  the  probability  ol  go,„g  lo 
center  j  on  completion  of  service  at  center  i.  Jackson  [J,ckJ63]  has  obtained 
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the  equilibrium  joint  probability  distribution  of  queue  lengths  for  a  broad 
class  of  queueing-theoretical  models  representing  a  network  of  service  centers. 
Customer  arrivals  are  modeled  as  a  generalized  Poisson  process  [cf.  WagnH69], 
whose  mean  arrival  rate  varies  almost  arbitrarily  with  the  total  number  of 
customers  already  in  the  system.  Service  completions  at  each  center  are  also 
modeled  as  generalized  Poisson  processes,  the  mean  service  rate  (u)  at  each 
center  varying  arbitrarily  with  the  queue  length  there.  Note  that  in  Jackson’s 
model  all  customers  are  identical.  Muntz  and  Baskett  [MuntR72]  have  a  more 
general  queueing  network  model  that  allows  different  classes  of  customers  to 
have  different  branching  probabilities.  Gordor  and  Newell  [GordW67]  have 
presented  a  solution  technique  for  closed  queueing  systems,  i.e.  networks  of 
queues  in  which  the  number  of  customers  is  constant. 

For  closed  queueing  systems,  Jackson’s  formulae  for  obtaining  the 
equilibrium  state  probabilities  are  listed  below. 

P(K)  -  w(K)/T(S(K)) 

'•/here, 

w (K)  -  ft  l¥[e(j)/u]  for  j([l,m] 
j»l  i=l 

where  e(j)  ■=  Se(i)r # j  j<[l,m] 

i«l 

T(K)  =  Zw (K)  summed  over  K  with  S(K)=n 

i 

But,  with  Pc  requests  distributed  uniformly  and  with  the  bus-bound 
situation  or  tp=tw,  the  exponential  server  model  reduces  to  m  servers  with 
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customers  ci-culating  with  uniform  routing  probabilities  i.e.  r„-p„-l/m  Thus, 

e(|)  denotes  the  average  frequency  of  visits  to  service  center  j.  Using  the 
above  formulae  we  get, 

w(K)  -  (l/u)n 
,n+m-l, 


,n+m-l. 

T(K)  =\  m-1  /(l/u)n 

«*>  ■  [ra]" 


for  all  K  such  that  £k.-n 
i  =  l 


*"  lhe  Slales  of  ,he  astern  have  equal  probability.  Physically,  this 

indicates  that  states  with  greater  congestion  in  the  queues  are  as  likely  as 
evenly  distributed  queues.  Note  that  the  above  analysis  holds  even  when 
successive  memory  requests  are  correlated  as  long  as  the  average  access 

frequency  P, ,-1/m.  The  probability  that  a  particular  Mp  module  is  idle, 

rob{Mp[i]  is  Idle},  is  the  fraction  of  the  total  number  of  states  that  has 

k|=0. 

In  other  words, 


Prob{Mo[i]  is  idle} 


"  — rriber  of  ways  of  assigning  n  Pc’s  to  m-1  Mp’s 
number  of  ways  of  assigning  n  Pc’s  to  m  Mp’s 


Therefore, 


Prob{Mp[i]  is  busy}  =  n/(n+m-l) 


E[number  or  busy  Mp’s]  «  m*Prob{Mp[i]  is  busy} 


■  m*n/(m+n-l) 
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The  above  expression  has  a  number  of  interesting  properties:  the  expression 
is  symmetric  in  m  and  nj  it  has  a  basic  hyperbolic  form,  asymptotic  to  n  as  m 
gets  large;  and,  if  we  let  m*n  the  above  expression  becomes 

n/(2-l/n) 

and 

Efnumber  of  busy  Mp’s]  -♦  n/2  for  n»l 

The  final  observation  has  important  implications.  It  states  that  as 
multiprocessor  systems  grow  to  include  more  and  more  Pc’s,  we  are  not  faced  with 
a  law  of  diminishing  returns:  no  matter  how  many  Pc’s  are  used,  if  we  have  the 
same  number  o*  memory  modules,  we  can  expect  half  the  processors  to  be  active. 


2.2  A  SIMPLE  DISCRETE  MARKOV  CHAIN  MODEL 


For  this  analysis  let  us  assume  that  all  the  Pc’s  are  characterized  by  a 
single  constant  processing  time  tp.  In  this  model,  the  memory  access  and  cycle 
time  are  constant.  The  exponential  server  model  discussed  above  allows  the 
memory  cycle  time  to  have  a  large  range  of  values.  However,  though  the  cycle 
time  is  not  a  constant  (as  seen  by  a  Pc)  it  certainly  does  not  have  an 
exponential  distribution.  The  constant  service  time  model  is  an  attempt  to 
de -emphasize  the  small  variance  in  the  value  of  the  cycle  time.  Although  the 
processing  time  is  not  a  constant  in  reality,  this  approach  yields  fairly  good 
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estimates  of  the  Mp/IR  as  substantiated  in  section  2.7.  Also,  all  the  memory 
units  are  assumed  to  have  the  same  cycle  time  tc  and  access  time  ta.  Thus,  the 
memory  rewrite  time  is  given  by  tw-tc-ta.  If  tp=  tw  then  all  memory  units  can  be 
considered  to  be  operating  synchronously.  Thus,  during  any  memory  cycle  the 
number  of  active  Pc’s  is  equal  to  the  number  of  busy  Mp’s. 


IWs  sedl0"'  a  simP'e  Uarkov  Chain  Analysis  is  presented  lor  the  case 
m  which  the  processors  request  every  memory  with  equal  likelihood.  A 
multiprocessor  system  with  „  Pc’s  and  m  Mp’s  is  likened  to  an  occupancy  problem 

Wi,H  "  ba"S  a"d  m  ur"'~  Ba"s  *re  randomly  assigned  to  the  m  urns  at  the 
beginning  ol  a  memory  cycle.  At  Ihe  end  ol  the  cycle  one  ball  is  removed  Irom 


each  urn.  Thus  it  Ihere  are  k  non-empty  urns  during  cycle  s  then  k  balls  are 
available  for  assignment  during  the  (s+l)-th  cycle. 


The  state  ol  the  -bove  mentioned  process  is  delined  by  a  m-tuple 


<k„k„...,k„>,  where  £k,-n  and  0<k,<n  lor  all  i.  The  number  ol  distinct  states 

r+m-li 

m-1  /  i.e.  the  number  of  ways  in 


Pl  the  system  is  given  by  the  combination, 
which  n  identical  balls  can  be  assigned  to  m  bins  [Fel!W66J  However,  since  all 
the  processors  behave  identically,  a  number  ot  the  distinct  states  are 
equivalent  re.  they  have  the  same  occupancy  and  nave  the  same  components,  e.g. 
states  (2,1,1),  (1,2,1),  (1,1,2)  are  equally  likely.  Thus,  the  reduced  states 
are  given  by  the  different  ways  in  which  the  number  n  can  be  partitioned  into  m 
parts,  i.e.  the  unordered  integer  solutions  to  the  equation  £x,.„  lor  0<X,Sn 
represent  equivalence  classes  ol  equally  likely  stales.  The’Lnber  ol  such 
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SOME  PROPERTIES  OF  THE  DISCRETE  MARKOV  CHAIN  MODEL 


Number  of  Pc’s 
Number  of  Mp’s 


Total  Number  Reduced  Execution  time 

of  States  States  for  program 


2 

3 

4 

35 

8 

S435 

10 

92378 

12 

1352078 

IB 

300540195 

2 

<  1  sec. 

5 

<  1  sec. 

22 

2  sec. 

42 

8  sec. 

77 

1  min. 

231 

1  hour 
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partitions  (lor  n<m)  is  asymptotic  to 

— ] —  exp[n(2n/3)T0.5]  [cf.  BecKE64] 

4JX^T 

Also, 

F(x)« - l. _ 

(l-xUl-x*)  .  .  . (l-xk) 

=  1+Zp(i)x' 

is  an  ordinary  generating  function  of  the  sequence  (p(0),  p(l),  p(k)), 

where  p(i)  denotes  the  number  of  partitions  of  the  integer  i  that  have  no  part 
exceeding  k,  K<i  [LiuC68J.  Table  2.1  shows  the  total  number  of  states  and  the 
number  of  reduced  representative  sta*e?  as  a  function  of  n. 


Let  the  representative  state  Si  denote  the  set  of  compositions  of  the 
number  n  that  yield  the  same  partition  e.g.  the  compositions  (2,1,1),  (1,2,1) 
and  (1,1,2)  correspond  to  the  partition  of  the  number  4  which  has  two  l’s  and 
one  2.  Further,  let  Si.j  be  the  individual  compositions  of  the  partition 
typified  by  representative  state  Si  and  Si.l  be  that  composition  which  has  its 
components  arranged  in  monotone  non-decreasing  order,  i.e.  (2,1,1)  for  the  above 
example.  The  algorithm  shown  in  Fig  2.1  generates  all  ,he  partitions  of  n  with 
the  components  in  monotone  non-increasing  order. 

Let  X jj  denote  the  probability  of  a  transition  from  Sj  to  Si.  Then,  due  to 
the  symmetry  of  the  problem, 


Figure  2.1  An  algorithm  for  generating  partitions 
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Xu  ■  £Prob{Transition  from  Sj.l  to  Si.k} 
Si.k  Si 


Let  the  m-tup!e  <k„k, . k„>  denote  the  state  of  the  Markov  chain.  If  x 

.s  the  number  of  non-zero  elements  in  this  vector  then  at  the  end  of  the  memory 
cycle,  x  new  processors  have  to  be  reassigned  to  memory  modules.  At  the  end  of 

the  current  memory  cycle  the  queue  is  characterized  by  the  partial  state  m-tuple 
(ji»jr . in),  where 

ji  -krl  if  kj>0 
“0  otherwise. 


A  new  state  (I „l*,. ..,!»)  is  reachable  from  (k.^.k*)  if  and  only  if 

li^Ji  tor  l<i<m.  If  the  above  condition  is  satisfied  the  probability  of  the 
state  transition  is  given  by 
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Initial  State 


Final  Terminal  States 


Add  1  Pc  Add  1  more  Pc 


Figure  2.2  Next  states  accessible  from  initial  state  (2, 2, 0,0) 
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probabilities  for  the  representative  class  of  states.  All  the  different  ways  of 
obtaining  the  same  partition  are  lumped  together  to  form  a  reduced  state. 

To  illustrate  a  computational  method**  for  generating  the  transition 
probabilities  consider  an  example  of  a  4  by  4  system.  The  number  4  can  be 
partitioned  in  5  different  ways  as  listed  below: 

4  0  0  0 
3  10  0 
2  2  0  0 
2  110 
1111 


These  partitions  represent  5  equivalence  classes  that  characterize  the 
state  of  the  Markov  Chain.  Let  us  consider  the  state  (2,2, 0,0).  At  the  end  of  a 
memory  cycle,  the  resultant  partial  state  is  (1,1, 0,0)  with  2  free  processors  to 
be  reassigned.  Figure  2.2  shows  the  different  ways  in  which  these  2  Pc’s  can  be 
assigned,  one  at  a  time,  to  reach  a  new  partial  representative  state.  After  both 
Pc  s  are  assigned  a  terminal  state  is  reached.  The  number  on  the  arrow  indicates 
the  number  of  ways  of  reaching  the  partial  or  terminal  state  that  the  arrow 
points  to.  Now  the  number  of  ways  in  which  a  final  state  can  be  reached  from  the 


**The  use  of  a  tree  to  generate  the  transition  probabilities  was  suggested  by  F. 
Baskett  and  D.Chewning  of  Stanford  University. 
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Level  0  Level  1  Level  2  Level  3 


Figure  2.3  Enumeration  tree  for  a  4  by  4  multiprocessor  system. 


4  0  0  0 
3  10  0 

3  10  0 
2  2  0  0 
2  110 

3  10  0 
2  2  0  0 
2  110 

2  110 
1111 

Level  4 
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initial  state  can  be  computed  by  traversing  the  tree,  e.g.  there  are  2x1  ways  of 
reaching  (1,1, 1,1)  and  (2x2  +  2x3)  ways  of  reaching  (2, 1,1,0  from  (2, 2, 0,0). 

It  is  possible  to  construct  a  single  tree  with  different  pointers  for 
different  initial  states.  Figure  2.3  shows  a  complete  tree  for  a  4x4  system. 
Initial  states  are  circled.  The  entire  transition  matrix  can  be  filled  by 
traversing  this  tree.  A  convenient  way  of  traversing  this  tree  is  by  using  a 
stack  which  has  depth  equal  to  one  more  than  the  number  of  Pc’s.  At  each  level 
the  stack  contains  a  partial  state  and  h  as  a  pointer  to  the  initial 
representative  state  (if  any)  from  which  it  is  derived.  The  stack  Is  initialized 
to  contain  the  path  that  leads  to  the  topmost  final  state.  For  this  example  the 
stack  is  initialized  as  shown  in  Fig.  2.4,  and  Fig.  2.5  shows  an  algorithmt  for 
using  the  tree  to  generate  the  transition  matrix,  shown  in  fig.  2.6. 

The  tree  in  Fig.  2.3  can  be  converted  into  a  mesh  by  lumping  together  ail 
occurrences  of  a  partial  state  in  the  tree.  e.g.  state  2100  at  tovel  3  appears 
twice,  the  resulting  mesh  for  the  4  by  4  example  is  shown  in  Fig.  2.7.  the 
algorithm  for  generating  the  transition  matrix  is  shown  in  Fig.  2.8.  Though  the 
implementation  of  this  algorithmtt  involves  a  matrix  multiplication  and  requires 

+A  FORTRAN  implementation  of  this  algorithm  is  listed  in  Appendix  A-l. 
ttSee  Appendix  A-2  for  a  listing  of  a  FORTRAN  implementation. 
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' 


level  4 
level  3 
level  2 
level  1 
level  0 


Initial 

state  STACK 

pointer 


NWAYS 

Number  of  ways  of 
getting  to  level  L 
from  level  L-l 


Figure  2.4  Initial  contents  of  the  stack  for  traversing  the  tree 
shown  in  figure  2.3 
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Figure?. 5  Algorithm  for  traversing  the  tree  shown  In  Figure  ?.3 


4  0  0  0 

3  10  0 

2  2  0  0 

2  110 

1111 

4  0  0  0 

1 

1 

0 

1 

4 

3  10  0 

3 

3  4-3 

2 

3  +  3+6 

12+12+24 

2  2  0  0 

0 

3 

2 

3+6 

12  +24 

2  110 

0 

0 

4+6 

6+12+18 

24+48+72 

1111 

0 

0 

2 

6 

24 

STEP  1  :  Xij  is  the  number  of  ways  of  reaching  i  from  j . 

(obtained  from  the  tree  of  fig. 2. 3  J by  using  the 


STEP  2  :  Xij  =  _Xil 
SXi;} 

i 


(  Note  that  ^Xij=mxf  where  x  of  the  m 
oomponents  of  j  are  non-zero ) 


Final  equations  to 


be  solved  simultaneously  i 


•  — 

• 

P4000 

0.25 

0.0625 

0.000 

P3100 

0.75 

0.3750 

0.125 

P2200 

0.00 

0.1875 

0.125 

P2100 

0.00 

.  0.3750 

0.625 

P 

mi 

0.00 

0.0000 

0.125 

SUBJECT  TO  p  +  p  .  p 

4000  r3100  2200 


0.015625  0.015265 

P4000 

0.187500  0.187500 

P3100 

0.140625  0.140625 

P2200 

0.562500  0.562500 

P2100 

0.09  3  75  0  0.09  3750 

p 

1111 

+  P2100  +  P  1111s  X 


Figure  2.6  Steps  In  the  generation  of  the  transition  matrix 
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more  temporary  storage  it  is  faster  than  the  algorithm  in  section  2  f  r  ,rgt 

n.  Thu*,  a  space-time  trade-off  affects  the  selection  of  the  algorithm  to  be 
used. 


The  following  theorem  and  lemma  can  be  used  to  increase  the  efficiency  of 
the  program  that  generates  the  transition  probabilities. 

Theorem  l :  There  is  a  one-to-one  correspondence  between  a  representative  state 
and  a  partial  state  that  the  representative  state  reduces  to  at  the  end  of  a 
cycle. 


Proof:  Let  (kl,kJ,...,km)  be  a  representative  state.  The  partial  state  at 
the  end  of  the  cycle  is  given  by 


where  jj-k|-l  if  k|>0 
=0  otherwise 

Since  no  two  representative  states  are  alike  *nd  Ek|-n,  it  follows  that  the 

i=l 


partial  states  are  distinct. 


Lemma  :  A  partial  state  at  level  L  in  the  enumerative  tree  of  Fig.  2.3  can 
correspond  to  a  terminal  state  with  exactly  n-L  occupied  Mp’s. 


Chapter  2  :  Multiprocessors  with  tp*tw 
2.2  Discrete  Markov  Chain  Model 


Page  41 


Proof:  Let  be  a  partial  state  in  the  tree  depicted  in  Fig. 

2.3.  Furthermore,  let  the  number  of  non-zero  elements  elements  in  the  partial 
m 

state  be  y  and  let  Ehen-x.  Since  one  Pc  is  always  removed  from  a  non-empty 
queue  at  the  end  of  a  cycle,  /  is  a  partial  state  that  can  e  reduced  from  a 
valid  representative  state  K“(k„k?,...,km),  if  and  only  if 

(i)  The  number  of  non-zero  elements  in  K  is  x,and 

(ii)  x>y 

Note  that  x  and  y  are  both  less  than  or  equal  to  min(m,n)  and  ^kj»n.  Then,  if 

i=l 

x>y,  J  has  at  least  x-y  zeros.  If  x<y  then  there  is  no  representative  state  K 

that  corresponds  to  the  partial  state  J.  If  x>y,  then  the  representative  state 

is  obtained  by  adding  y  l’s  to  the  non-zero  elements  of  J  and  replacing  x-y 

m 

zeros  of  J  by  1.  At  level  L,  Eji*  l.  Therefore,  x,  the  number  of  occupied  Mp’s 

i»l 

in  K,  is  equal  to  n-L. 


Figure  2.9  shows  the  average  number  of  busy  Mp’s  when  n**m.  The  curve  has  an 
almost  constant  slope  of  .586  for  n>4.  Thus,  this  model  also  shows  the  absence 
of  a  law  of  diminishing  returns.  Figures  2.10  and  2.11  show  the  effect  of  adding 
a  Pc  and  an  Mp  respectively  on  the  average  number  of  busy  Mp’s.  Also,  the 
average  number  of  busy  Mp’s  is  almost  symmetrical  with  respect  to  m  and  n.  The 
results  obtained  from  the  model  are  compared  with  a  less  restrictive  simulation 


model  in  section  2.7. 


2.9  Multiprocessor  Systems 
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2.3  APPROXIMATE  DISCRETE  MARKOV  CHAIN  MODELS 

Even  with  the  representative  state  approach  the  number  of  states 
characterizing  the  Markov  Chain  increases  rapidly  as  n  increases.  Table  2.1 
shows  the  number  of  representative  states  as  a  function  of  n  and  the  approximate 
execution  time  needed  on  a  DEC  PDP-10  for  the  FORTRAN  program  listed  In  Appendix 
A-2,  which  determines  the  stationary  state  probabilities.  Though  the  analysis  is 
exact,  the  size  of  the  problem  (as  indicated  by  the  array  space  used  by  the 
program)  and  the  time  required  restricts  the  use  of  the  model  described  in 
section  2.2. 

2.3.1  A  New  Approximate  Discrete  Markov  Chain  Model 

Because  of  the  high  cost  of  computation  for  the  previous  model,  an 
approximate  discrete  Markov  chain  model  will  now  be  proposed,  and  the  results  of 
section  2.2  will  be  used  to  improve  the  applicability  of  this  new  approximate 
model.  The  state  is  denoted  by  the  number  of  active  Pc’s.  Thus  the  number  of 
states  is  min(n,m).  Note  that  only  those  Pc’s  that  are  active  during  the  current 
cycle  make  new  requests  during  the  next  cycle.  Also,  the  number  of  busy  memories 
is  equal  to  the  number  of  active  processors.  The  approximation  propounded  here 
consists  of  removing  the  non-active  Pc’s  from  the  Mp  queues  and  reassigning  them 
as  indicated  below.  This  approach  was  motivated  by  a  gross  intuitive  feeling 
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that  if  the  Pc’s  are  removed  from  the  queues  and  asked  to  make  new  requests, 
they  would  end  up  in  the  same  queues  as  before.  However,  this  is  not  exactly 
true,  and  the  heavily  congested  states  tend  to  be  de-emphasized.  Let  the  number 
of  busy  Mp’s  during  the  current  cycle  be  i.  Then,  during  the  next  cycle  the  i 
active  Pc’s  make  a  new  request  to  the  m  Mp’s  and  some  of  the  n-i  non-active  Pc’s 
get  serviced  if  they  are  at  the  front  of  the  queue.  However,  in  this  approximate 
model,  the  n-i  non-active  Pc’s  are  removed  from  the  i  Mp  queues  and  reassigned 
to  the  same  i  queues.  This  is  equivalent  to  the  n-i  Pc’s  making  new  requests  to 
the  i  Mp’s.  Thus  the  i  Pc’s  may  not  end  up  in  the  same  queues  that  they  were 
removed  from.  This  approximation  will  be  used  widely  in  this  thesis.  The  results 
of  this  section  show  the  accuracy  of  the  approximation. 

Now,  the  probability  that  j  out  of  the  i  active  Pc’s  make  a  new  request  at 
the  beginning  of  the  next  cycle  to  one  of  the  i  Mp’s  that  are  busy  during  the 
current  cycle,  is  given  by 

XPROB-  (  j )  *  (m)*(  l-i/n^i-^ 

Thus,  during  the  next  cycle,  with  probability  XPROB  the  n-i  non-active  Pc’s  and 
j  active  Pc’s  are  assigned  to  the  i  busy  Mp’s  of  the  current  cycle,  and  the 

N 

remaining  i-j  active  Pc’s  make  a  request  to  the  other  m-i  Mp’s. 

Let  nn-n-i+j  and  k,»min(nn,i).  Note  that  nn  denotes  the  number  of  Pc’s  that 
will  be  queued  (during  the  next  cycle)  for  the  i  Mp’s  that  are  busy  during  the 
current  cycle.  Also,  let  X(l ,)  denote  the  conditional  probability  that  I,  out  of 
i  busy  Mp’s  are  also  busy  during  the  next  cycle,  given  that  n-i+j  Pc’s  will  be 
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queued  for  the  i  Mp’s.  The  number  of  ways  that  nn  different  Pc’s  can  be  assigned 
to  i  different  Mp  queues  is  inn  [RiorJ58,  pp.90].  Also  the  number  ways  that  the 
nn  Pcs  can  access  i  Mp’s  so  that  exactly  I,  Mp’s  are  occupied  and  i-l,  are  not 
is  given  by  Riordan[RiorJ58]  as 
CM(i,l,)*S(nn,l,) 

where  CM{i,l,)«  i(i-l).  .  .(i-1,+  1) 
and  S(nn,l|)  is  the  Stirlingt  number 
of  the  second  Kind 

Thus, 

X(l.)  -  CM(i,l,)*S(nn,l,)/in" 

Now,  let  Yd,)  be  the  conditional  probability  that  I,  out  of  the  m-i  currently 
non-busy  Mp’s  are  busy  during  the  next  cycle,  given  that  i-j  Pc’s  make  a 
request.  Then, 

Yd,)  -  CM(m-i,l,)*S(i-j,l,)/(m-i)1** 

Thus,  the  probability  that  k*l,+l,  Mp’s  will  be  busy  during  the  next  cycle  is 
XPROB*Xd,)*Yd,) 

Therefore,  TRANS(k,i),  the  probability  of  a  transition  from  current  state  i  to 
next  state  k  is 


tStirling  Numbers  of  the  second  kind  are  used  to  convert  from  powers  to  binomial 
coefficients. 

x"  -  ZS(n,k)fJ  k! 

Also,  k 

S(i,j)  -  j*S(i-l,j)+S(i-l,j-l) 
with  S(i,0)“S(0,i)-0 
and  ${i,i)«l 


Page  48 


TABLE  2.2 


Comparison  of  Exact  and  Approximate  Models 

Approximate  Discrete  Markov  Chain  Model  for  tp-tu 
Average  Number  of  Busy  Mp’s 


m=2 

m-4 

m-8 

m*>16 

n«=2 

1.5000 

1.7500 

1.8750 

1.9735 

n=4 

1.8000 

2.6550 

3.2751 

3.6291 

n-8 

1.9846 

3.4858 

5.0999 

6.3680 

n=16 

1 . 9999 

3.9343 

6.8436 

10.0058 

Exact  Discrete  Markov  Chain  Model 
Average  Number  of  Busy  Mp’s 


m=2 

m=4 

m*8 

m=16 

n=2 

1.5000 

1 . 7500 

1.8750 

1.9735 

n=*4 

1.7500 

2.6210 

3.2652 

3.6268 

n-8 

1.8750 

3.2657 

4.9471 

6.3149 

n-16 

1.9375' 

3.6270 

6.3154 

9.6258 
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HXPROB*X(l|)*Y(i,) 

the  summation  is  over  the  different  ways  of  choosing 
1 1  and  I,  such  that  k-1,+1*. 

A  FORTRAN  program  that  determines  the  steady  state  probabilities  is  listed 
in  Appendix  A-3.  Table  2.2  shows  the  average  number  of  busy  Mp’s  as  predicted  by 
this  approximate  model.  Due  to  the  small  number  of  states,  this  approximate 
model  needs  much  less  computer  time;  typically  about  1  second  of  execution  time 
on  a  PDP-10  for  a  16x16  multiprocessor  system.  Table  2.2  shows  that  the  average 
number  of  busy  Mp’s  is  almost  symmetric  in  m  and  n.  The  approximate  model  has  a 
larger  error  for  n>m.  Therefore,  a  better  estimate  of  the  performance  of  a  nxm 
system  can  be  obtained  by  evaluating  the  performance  of  a  mxn  system  if  t>m,  a 
conclusion  possible  only  due  to  the  results  of  the  exact  analysis  of  section 
2.2. 

2.3.2  Strecker’s  Approximation 

Strecker  [StreW70]  has  an  approximate  closed  form  solution  to  the  discrete 
Markov  Chain  model  presented  here.  His  approach  is  equivalent  to  removing  the 
queued  processors  from  all  the  memory  modules  at  the  end  of  a  memory  cycle  and 
reassigning  them  among  all  the  memory  modules.  In  the  approximate  model  proposed 
earlier  in  section  2.3.1,  the  Pc’s  that  were  queued  at  the  end  of  the  cycle  were 
reassigned  only  among  the  busy  Mp’s.  Thus,  Strecker’s  analysis  Is  more 
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TABLE  2.3 


Expected  number  of  busy  memories  in  one  cycle 

Number  of  Pc’s  -  1.2 . 8  (rows) 

Number  of  Mp’s  -  1,2 . 8  (columns) 


Discrete  Markov  Chain'  Model 


1 . 0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1 . 0000 

1 . 0000 

1 . 0000 

1.5000 

1.6667 

1.7500 

1 . 8000 

1.8333 

1.8571 

1.8750 

1 . 0000 

1.6667 

2.0476 

2.2692 

2.4095 

2.5054 

2.5748 

2.6272 

1 . 0000 

1.7500 

2.2701 

2.6210 

2.8630 

3.0365 

3.1657 

3.2652 

1 . 0000 
1.0000 

1.8000 

2.4102 

2.8633 

3.1996 

3.4530 

3.6482 

3.8019 

1.8333 

2.5059 

3.0370 

3.4533 

3.7809 

4.0415 

4.2518 

1 . 0000 

1.8571 

2.5751 

3.1663 

3.6486 

4.0418 

4.3636 

4.6292 

1 . 0000 

1.8750 

2.6274 

3.2657  3.8024  4.2521  4.6294 

Strecker’s  Approximation 

4.9471 

1.0000 

1 . 0000 

1.0000 

1 . 0000 

1.0000 

1.0000 

1 . 0000 

1 . 0000 

1 . 0000 

1.5000 

1.6667 

1.7500 

1.8000 

1.8333 

1.8571 

1 . 8750 

1 . 0000 

1 . 7500 

2.1111 

2.3125 

2.4400 

2.5278 

2.5918 

2.6406 

1 . 0000 

1.8750 

2.4074 

2.7344 

2.9520 

3.1065 

3.2216 

3.3105 

1 . 0000 

1.9375 

2.6049 

3.0508 

3.3616 

3.5887 

3.7613 

3.8967 

1 . 0000 

1.9687 

2.7366 

3.2881 

3.6893 

3.9906 

4.2240 

4.4096 

1 . 0000 

1.9844 

2.8244 

3.4661 

3.9514 

4.3255 

4.6206 

4.8584 

1 . 0000 

1.9922 

2.8829 

3.5995 

4.1611 

4.6046 

4.9605 

5.2511 

Percentage  Error 


0.0000  0.0000 
0.0000  0.0000 
0.0000  4.9979 

0.0000  7.1429 
0.0000  7. 8389 
0.0000  7.3858 
0.0000  8.8548 
0.0000  8.2507 


0.0000  0.0000 
0.0000  0.0000 
3.1012  1.9082 
5.0482  4.3266 
8.0782  6.5484 
9.2063  8.2680 
9.6812  9.4685 
9.7244  10.2214 


0.0000  0.0000 
0.0000  0.0000 
1.2658  0.8941 

3.1086  2.3053 
5.0631  3.9299 

G. 8340  5.5463 
8.2991  7.0191 

9.4335  8.2900 


0.0000  0.0000 
0.0000  0.0000 
0.6602  0.5100 
1 • 7658  1.3874 
3.1002  2.4935 
4.5157  3.7114 
5.8896  4.9512 
7.1521  6.1450 
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approximate  and  will  underestimate  the  interference.  However,  Strecker  obtains 
his  approximate  solution  in  a  closed  form,  which  will  be  modified  here  to  yield 
more  accurate  estimates  of  the  MpAR.  Thus  the  state  of  the  system  is  considered 
independent  of  the  state  during  the  last  cycle.  If  we  use  this  assumption  the 
distribution  of  Pc’s  queued  for  an  Mp  follows  the  binomial  distribution: 

Prob{Y«r}  -  (r)  (&r(t- 

where  Y  is  a  random  variable  equal  to  the  number  of  Pc’s  queued 
for  Mp[j]  and  Pjj*=l/m  for  all  i  and  j. 

Thus, 

Prob{Mp[j]  is  busy}  «•  1-  Prcb{nobody  is  queued  for  Mp[j]} 

-  1-  (1-1  /m)n 

In  other  words,  the  occupancy  of  Mp[j]  is  l-(l-l/m)n,  and 
E[no.  of  occupied  Mp’s]  -  £{Occupancy  of  Mp[j]} 

-  m*[l-(l-l/m)n] 


Tabic  2.3  shows  a  comparison  of  Strecker’s  results  and  the  exact  Markov 
chain  analysis.  Note  that  Strecker’s  results  are  optimistic  estimates  of  the 
unit  execution  rate.  It  is  encouraging  to  note  that  such  a  simple  expression  is 
within  6  to  8^  of  the  exact  Markov  Ch^in  model  for  m/n>0.75.  This  is  because  his 
analysis  assumes  that  all  n  Pc’s  always  make  a  new  request  at  the  beginning  of 
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each  memory  cycle,  whereas  in  the  discrete  Markov  chain  only  those  Pc’s  that 
receive  service  are  allowed  to  make  new  requests.  Moreover,  note  that  the 
expression  m*[l-(l-l/m)n]  can  be  written  in  an  exponential  form  as 
m*{l-exp[n*  In  (1-1/m)]} 

rigure  2.12  shows  a  plot  of  the  above  expression  for  fixed  m;  the  relnxation 
time  [  In  (1-1/m)]  approaches  m  as  m  gets  large. 

The  exact  discrete  Markov  chain  model  of  section  2.2  shows  the  performance 
to  be  almost  symmetric  in  n  and  m.  Also  the  analysis  c'  the  error  of  Strecker’s 
approximation,  shown  in  Table  2.4,  indicates  a  greater  accuracy  for  n<m.  Thus,  a 
more  accurate  estimate  of  the  average  number  of  busy  Mp’s  is  i*/ l-(l-l/iP /, 
where  i“max(n,m)  and  j*min(m,n).  Note  that  the  above  formula  was  not  derived  by 
Strecker.  It  was  possible  to  obtain  it  due  to  the  knowledge  gained  from  the 
exact  analysis  presented  in  section  2,2. 


2.4  DISCRETE  MARKOV  CHAIN  MODEL  OF  SKINNER  AND  ASHER 

Skinner  and  Asher  [SkinC69]  model  the  multiprocessor  system  with  tp**tw  as  a 
discrete  Markov  chain.  They  assume  a  matrix  of  probabilities  that  express  the 
likelihood  that  a  given  processor  requests  service  from  a  given  memory  at  the 
beginning  of  a  memory  cycle,  provided  the  Pc  is  not  queued.  They  also  assume  a 
matrix  of  probabilities  that  express  the  likelihood  of  the  various  outcomes  that 
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can  arise  when  there  are  simultaneous  requests  to  one  memory  by  several 
processors.  The  state  of  the  system  is  characterized  by  the  processors  queued 
for  the  different  memory  modules.  A  state  transition  matrix  is  formed  from  the 
acctss  probabilities  and  the  steady  state  probabilities  of  various  states  are 
determ, ned  by  solving  the  state  transition  equations.  The  number  of  states  of 
the  system  increases  very  steeply  with  an  increase  in  the  number  of  Pc’s  and 
Mp’s.  Closed  form  solutions  are  presented  only  for  cases  with  up  to  2  Pc’s  and  n 
Mp’s.  The  analysis  in  the  previous  section  is  similar  to  Skinner  and  Asher,  but 
with  uniformly  random  access  patterns  for  ali  the  Pc’s,  i.e.  pu-1/m  for  all  i. 
The  results  of  Skinner  and  Ashsr  are  ccmpared  with  a  now  approximate  model  in 
section  2.6. 


2.5  DIFFUSION  APPROXIMATIONS 

An  approximation  method  that  has  been  proposed  for  the  solution  of  general 
queueing  networks  is  the  diffusion  approximation  [cf.  NeweG71;  KobaH73].  A 
discrete-state  process  is  approximated  by  a  Wiener-Levy  diffusion  process  with  a 
continuous  path.  The  key  assumption  in  such  an  analysis  is  that  incremental 
changes  in  the  queue  lengths  are  normally  distributed.  This  leads  to  a 
characterization  of  the  queueing  network  by  a  set  of  diffusion  equations.  The 


accuracy  of  the  approximation  depends  on  three  factors:  (i)  approximation  of  a 
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discrete-state  process  by  a  time-continuous  Markov  process,  (ii)  choice  of 
proper  reflecting  barriers,  and  (iii)  discretization  of  the  continuous  density 
function  for  queue  lengths.  Surprisingly,  for  the  simple  discrete  Markov  Chain 
model  of  section  4,  the  diffusion  approximation  yields  a  result  identical  to 
that  v/ith  exponential  servers  derived  from  Jackson’s  formulae.  However,  the  main 
utility  of  the  diffusion  approximation  in  this  context  is  that  it  can  be  used  to 
analyze  the  effect  of  different  coefficients  of  variation  (  ratio  of  standard 
deviation  to  the  mean)  for  the  service  time  distribution.  Unfortunately,  for 
Pjj  =  l/m,  the  diffusion  approximation  predicts  that  the  average  number  of  busy 
memories  to  be  independent  of  the  service  time  distribution  as  long  as  all 
servers  are  identical.  Thus,  the  diffusion  approximation  has  proved  to  be  a 
disappointing  tool  in  this  study. 


2.6  AN  APPROXIMATE  MODEL  FOR  ARBITRARY  Pu 


In  this  section,  we  explore  the  effect  of  non-uniform  access  probabilities 
(i.e.  P(j  is  no  longer  restricted  to  be  equal  to  1/m)  on  the  MpAR.  This 
situation  often  arises  in  physical  systems  in  which  each  Pc  has  a  greater 
preference  for  a  different  memory  module, Now,  we  analyze  multiprocessor  systems 
in  which  F  jj  can  take  any  arbitrary  value  between  0  and  1  subject  to  EP|j*l.  Let 

j-1 

Qij  denote  the  probability  that  processor  i  is  queued  for  memory  j.  The 
probability  that  memory  j  is  busy  or  occupied  is  given  by 
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l-Prob{no  processor  ,s  queued  for  memory  j) 

i-fro-Qij) 

i-1 

assuming  that  the  event  of  a  Pc  not  being  queued  for  an  Mp  is  independent 

of  other  Pc  s  not  being  queued.  The  simulation  results  shown  later  justify  this 
assumption. 

The  above  denotes  the  average  number  of  requests  serviced  in  one  memory 
cycle. 

Thus,  if,  M  is  the  expected  value  of  the  number  of  occupied  memories  during 
a  memory  cycle, 

then  §  Prob[Mp[j]  is  occupied] 
j-1 

-  2  n-fri-Qij)] 

j=l  i=l 

In  general,  the  probabilities  Qij  and  Pij  are  not  equal.  The  Pij’s  are  a 
characteristic  of  each  processor  and  therefore  independent  of  the  behavior  of 
the  other  processors  in  the  system.  However,  any  Qij  is  a  function  of  all  the 
Pij’s  of  the  multiprocessor  system.  Strecker  (StreW70)' has  evaluated  the  unit 
execution  rate  of  a  multiprocessor  system  in  which  Pij-l/m  for  all  values  of  i 
and  j.  He  makes  no  attempt  to  obtain  a  relationship  between  Qij  and  Pij. 
Strecker  assumes  that  Pij-Qij  and  states  that  his  results  are  approximate.  The 
approximation  in  his  analysis  is  due  to  the  assumed  binomial  distribution  for 
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the  queued  processors.  If  all  the  Pc’s  are  identical  and  have  equal  likelihood 
of  accessing  every  memory  unit,  then  the  probability  of  any  processor  being 
queued  for  any  memory  is  uniformly  equal.  Since  the  memories  operate 
synchronously  and  all  requests  occur  at  the  end  of  a  memory  cycle  a  processor  is 
always  queued.  Hence,  the  probability  Qij=l/m. 

Let  us  focus  our  attention  on  Pc[i]  and  Mp[j],  The  time  spent  by  Pc[i]  in 
queue  for  Mp[i]  depends  on  P 4 j  and  Qu,  Mi.  Thus,  Qu  depends  on  other  Qut’s, 
which  in  turn  depend  on  Q,j.  Let  us  for  a  moment  allow  other  processors  to  make 
requests  to  memory  before  Pc[i];  and  let  Yjj  denote  the  probability  that  none  of 
the  other  n-1  Pc’s  request  service  from  Mp[j], 

Thus, 

y.j  *=  nu-Pu) 

U  i 

Now,  if  none  of  the  other  Pc’s  make  a  request  to  Mp[j]  the  waiting  time 
(including  service)  is  one  cycle  time.  However,  if  other  Pc’s  make  a  request  to 
Mp[j]  before  Pc[i],  then  Pc[i]  has  to  wait  for  those  Pc’s  to  bt  served.  Now,  let 
us  look  at  Pc[l],  which  has  to  wait  for  service  from  Mp[j]  if  other  processors 
make  a  request  before  it  does.  Here,  we  shall  allow  other  Pc’s  to  make  requests 
before  Pc[l].  Thus,  Pc[l]  waits  in  queue  for  Mp[J  with  probability  P| j*[  1  -V | j ]. 
Thus,  the  average  number  of  f  c’s  that  Pc[i]  finds  waiting  before  itself  is 
5ZP|j*[l-Y|j],  However,  Pc[i]  accesses  Mp[j]  with  probability  P|j.‘ Therefore,  the 
weighted  waiting  time  T 4 j  for  Pc[i]  in  queue  for  Mp[j]  is 
Pu  *  (Y|  j  +  [l+ZP|J*(l-Y,J)]*[l-Y,j]} 

U-i 

m 

Therefore,  the  average  time  is  equal  to  ZT|j;  and 

M 


Comparison  of  approximate  model  and  Skinner  and  Asher's  results 
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Qu  -  t(J/t 

Figure  2.13  compares  the  execution  rate  predicted  by  the  approximate  model 
of  this  section  with  the  exact  Markov  chain  model  of  Skinner  and  Asher  for  a  2x2 
multiprocessor  system.  P j j  is  equal  to  o c  for  i«j  and  equal  to  ft  for  iVj. 

Now,  suppose  each  Pc  has  a  greater  preference  for  one  Mp.  Let  us  use  the 
model  to  examine  the  effect  of  assigning  access  probabilities  so  that  P|j-cC>l/m 
for  i-j  and  Pj j -/3~(  1 -oc)/(m- 1 )  for  \+\.  Note  that  /?<l/m.  Thus,  each  Pc  has  a 
greater  preference  for  a  certain  memory  module.  For  example,  the  access 
probability  matrix  for  a  4x5  multiprocessor  system  is  shown  below. 
ot  ft  ft  ft  ft 
ftuftft  ft 
ft  ft  ° l  ft  ft 

ft  ft  ft  oC  ft 

Figures  2.14  and  2.15  show  the  effect  of  changing  e*  from  0  to  1  for  a  8x16 
and  a  16x16  multiprocessor  system.  Also,  Table  2.4  compare.-  the  results 
predicted  by  the  model  with  simulation  results,  because  the  analysis  is 
approximate.  Note  that  both  graphs  show  that  the  execution  rate  is  a  minimum  for 
oC=l/m.  With  cc=l/m  this  model  predicts  the  same  result  as  Strecker’s 
approximation  as  both  models  assume  a  binomial  distribution  for  the  queued 
processors.  The  approximation  error  reduces  as  o c  increases,  the  error  being  zero 
for  od-1.  An  error  correction  factor  can  be  used  as  described  below.  For  cC«l/m 


Average  Number  of  Busy  Mp* 
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Fjgurc  2.15  The  Kffect  of  on  the  execution  rate  of  a 
16x16  multiprocessor  system. 
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TABLE  2.4 


Comparison  of  simulation  results  with  analytic  model  of  Section  2.3 
A  16x16  Multiprocessor  System 


90X  confidence  interval  Analytic 

from  simulation  Result 

0.25  (9.7482  ,  9.8329)  10.4174 

0.50  (10.5824  ,  10.5946)  10.9973 

0.75  (12.0875  ,  12.3907)  12.5323 

0.90  (13.9111  ,  14.2145)  14.3873 


A  8x16  Multiprocessor  System 

90*4  con f  idence  interval  Analytic 

from  simulation  Resul  t 

0.25  (6.2803  ,  6.4139)  6.4880 

0.50  (6.5374  .  6.5614)  6.6885 

0.75  (6.9693  ,  7.0853)  7.1673 

0.90  (7.4338  ,  7.6041)  7.6328 


A  8x8  Mul t iprocessor  System 

90*4  conf idence  interval  Analytic 

from  simulation  Result 

0.25  (4.8886  ,  5.1126)  5.2798 

0.50  (5.2797  ,  5.4127)  5.5372 

0.75  (6.0326  ,  6.1901)  6.2807 

0.90  (6.9245  ,  7.1411)  7.1975 
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the  exact  Markov  chain  model  should  be  used  to  compute  the  execution  rate  X,; 
the  approximate  model  of  this  section  predicts  an  execution  rate,  X7,  equal  to 
m*[l-(l-l/m)n]  for  oC=l/m.  The  correction  factor  for  cC-l/m  is  X7/X,.  Let 
e«=(X7-X,)/X7.  Then  a  linear  error  correction  factor  F  is  l-e*(l-od)/(l-l/m)  for 
oc>l/m.  The  corrected  estimate  is  F*X,  where  X  is  the  execution  rate  for  the 
given  value  of  The  dotted  lines  in  figures  2.14  and  2.15  show  the  corrected 
execution  rates.  The  vertical  lines  show  907.  confidence  intervals  obtained  by 
simulation!  This  model  shows  the  increase  in  the  MpAR  due  to  deskewing  of  the 
processors’  access  patterrs. 


2.7  CONCLUDING  REMARKS 

Tables  2.3  and  2.5  compare  the  numerical  results  obtained  from  the 
different  models  described.  Note  that  the  continuous  and  discrete  Markov  chain 
models  exhibit  similar  trends,  though  the  numerical  values  differ.  Strecker’s 
approximation  gets  better  as  m/n  increases,  whereas  the  continuous  time  and 
discrete  Markov  models  get  closer  for  larger  n/m  ratios.  Table  2.6  shows  some 
simulation  results  obtained  with  exponential  distributions  for  the  processing 

+AII  confidence  intervals  in  this  thesis  will  be  shown  by  vertical  lines.  Unless 
specified  otherwise,  the  confidence  intervals  are  calculated  from  about  10 
independent  samples,  each  averaged  over  about  3000  cycles. 
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TABLE  2.5 


Expected  number  of  busy  memories  in  one  cycle 
Number  of  Pc’s  -  1,2,... ,8  (rows) 
Number  of  rip’s  -  1,2 . 8  (columns) 


Discrete  Markov  Chain  Node  I 


1.0000 
1.0000 
1 . 0000 
1 . 3000 
1.0000 
1.0000 
1.0000 
1.0000 


1 . 0000 

1.5000 

1 . 6667 

1 . 7500 

1.8000 
1 . 8333 
1.8571 
1.8750 


1 . 0000 

1 .6667 
2.0476 
2.2701 
2.4102 
2.5059 
2.5751 
2.6274 


1 . 0000 

1.7500 
2.2692 
2.6210 
2.8633 
3.0370 
3.1663 
3.2657 


1 . 0000 

1.8000 
2.4095 
2.8630 
3.1996 
3.4533 
3.6486 
3.8024 


1.0000 

1.8333 

2.5054 

3.0365 

3.4530 

3.7809 

4.0418 

4.2521 


1 . 0000 
1.8571 
2.5748 
3.1657 
3.6482 
4.0415 
4.3636 
4.6294 


1.0000 

1.8750 

2.6272 

3.2652 

3.8019 

4.2518 

4.6292 

4.9471 


Continuous  Time  Harkov  Chain  Model 


1.0000  1.0000 

1.0000  1.3333 

1.0000  1.5000 

1.0000  1.6000 
1.0000  1.6667 

1.0000  1.7143 

1.0000  1.7500 

1.0000  1.7778 


1.0000  1.0000 

1.5000  1.6000 

1.8000  2.0000 
2.0000  2.2857 
2.1429  2.5000 
2.2500  2.6667 

2.3333  2.8000 
2.4000  2.9091 


1.0000  1.0000 

1.6667  1.7143 
2.1429  2.2500 

2.5000  2.6667 
2.7778  3.0000 
3.0000  3.2727 
3.1818  3.5000 

3.3333  3.6923 


1.0000  1.0000 

1.7500  1.7778 

2.3333  2.4000 

2.8000  2.9091 
3.1818  3.3333 

3.5000  3.6923 
3.7692  4.0000 
4.0000  4.2667 


Percentage  Differehce 


0.0000  0.0000  0.8000  0.0000 
0.0000  11.1133  10.0018  8.5714 
0.0000  10.0018  12.0922  11.8632 
0.0000  8.5714  11.8982  12.7928 

0.0000  7.4056  11.0904  12.6882 

0.0000  6.4910  10.2119  12.1930 

0.0000  5.7671  9.3899  11.5687 

0.0000  5.1840  8.6549  10.9196 


0.0000  0.0000  0.0000  0.0000 

7.4056  6.4910  5.7671  5.1840 

11.0645  10.1940  9.3794  8.6480 
12.6790  12.1785  11.5519  10.9059 
13.1829  13.1190  12.7844  12.3254 
13.1266  13.4412  13.3985  13.1591 
12.7939  13.4049  13.6218  13.5920 
12.3369  13.1653  13.5957  13.7535 
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TABLE  2.B 


Expected  number  of  busy  memories  in  one  cycle  : 


Exponential  distribution  for  tp 


Constant  tu-ta-Ettp] 


Simulation  resul ts 


n-2 

n«3 

n»4 

n=5 

n=B 

n-7 

n-8 


2  3  4  5  B  7  8 

1.4088  1.5931 
1 . B1 85  1.9878  2.2075 

2.2198  2.5G43  2.8004 

2.7980  3.1472  3.4300 

3.4088  3.7122  4.0040 

3.9990  4.3196  4.5804 
4.5B66  4.9028 
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time,  with  mean  equal  to  tw. 

i.e.  Prob{tp=x}  -  X  exp(-Xx)  where  \-l/tw-l/ta«l/E[tp] 

Note  that  the  values  in  Tasle  2.6  lie  between  those  predicted  by  Strecker  and 
Jackson,  and  within  57.  of  the  exact  discrete  Markov  chain  model  for  most  cases. 
Thus,  modeling  the  variable  processing  time  by  a  constant  equal  to  the  mean 
processing  time  is  a  reasonable  simplification.  Table  2.7  shows  the 
characteristics  of  the  parameters  in  the  various  models. 

It  is  important  to  note  that  with  tp-tw  a  Pc  is  fast  enough  to  make  a  new 
request  to  memory  when  the  memory  recovers.  Thus,  for  a  lxl  system  the  memory  is 
always  busy  Also,  'ith  m>n,  if  there  is  no  contention  for  memory  the  maximum 
number  of  busy  memories  is  min(m,n).  An  important  result  observed  was  the 
absence  of  a  law  of  diminishing  returns:  the  performance  of  a  multiprocessor 
system  with  n  processors  and  n  memories  continues  to  rise  at  a  constant  rate  as 
n  increases.  A  simple  exponential  server  model  showed  this  rate  to  be  0.5;  a 
constant  processing  time  model  predicted  a  slope  of  0.586  for  the  average  number 
of  busy  Mp’s.  The  exponential  server  model  gives  the  average  number  of  busy  Mp’s 
as  ntm/(n+m-l ).  An  approximate  result  for  constant  processing  times  gives  the 
average  number  of  busy  Mp’s  as  i'M/ l-d-l/i}1  /,  where  i=max(n,m)  and  j“min(n,m). 
An  intuitively  obvious  conclusion  limits  the  maximum  number  of  active  Pc’s  by 
min(n,m).  This  maximum  is  reached  if  each  processor  accesses  only  one  memory  all 


the  time. 
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TABLE  2.7 


Processing 

Time 

Memory  Cycle 
Time 

Analysis 

Computet lonal 

Ease 

Oiscrete 

Markov  Chain 

Constant 

tpctu 

Cons  tant 

Exact 

Solution  Is 
algorithmic. 
Unuleldy  for 
large  n. 

Strecker’s 
Approximat ion 

Constant 

Constant 

Approximate 

Closad  form 
so lut Ion. 

Simple  formula. 

Cont  Inous  Time 
Markov  Chain 

Exponent lal 

Exponent ial 

Exact 

Closed  form 
solution. 

Simple  formula. 

0 1 f  fusion 
Approximation 

Constant 

Constant 

Approximate 

Closed  form 
solution. 

Simple  formula. 

Simulation 

Modo  1 

Exponential 

E(tp}»tuEta 

Constant 

Approximate 

Unuleldy  due  to 
siou  stochast Ic 

convergence. 

Page  68 


CHAPTER  3 

MULTIPROCESSOR  SYSTEMS  WITH  TP>TW 


In  this  chapter  multiprocessors  with  tp>tw  will  be  discussed.  First,  a 
discrete  Markov  chain  model  (or  a  constant  processing  time  equal  to  twite  will 
be  developed  and  a  general  methodology  for  constant  tp.twii.tc  will  be 
presented.  A  more  general  model  lor  processing  time  having  a  geometric 
distribution  with  its  mean  value  greater  than  tw  will  be  described.  An 
exponential  server  model  developed  by  McCredie[McCrJ73J  will  also  be  discussed. 


3.1  DISCRETE  MARKOV  CHAIN  MODELS  FOR  MULTIPROCESSORS  WITH  TP-TWiTC 


In  this  section,  discrete  markov  chain  models  will  be  developed  for 
multiprocessor  system  in  which  the  processing  time  tp  is  a  constant  and  is 
exactly  equal  to  twite.  The  timing  of  a  typical  instruction  is  shown  balow. 

Hp  - It - - 

j  ^effective  execution  time 


Pc 

1. 

1 

L4 

P 

— —  “tp=tw+tc  " 

Note  that  a  Pc  that  is  serviced  during  this  cycle  operates  on  its  data  during 
■he  next  memory  cycle.  This  will  be  modeled  by  associating  a  server  with  each  Pc 


Multi-server 
Station 

Fc's  Central 

Service  time  tc 


Kp  servers 
Service  time  tc 

Crosspoint  Switch 


Figure  3.1  Queueinc  model  for  multiprocessors  with  tp 
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3.1  Discrete  Markov  Chait\  Model  for  tp^iumc 

with  a  constant  processing  time  equal  to  tc;  and  this  portion  of  the  processing 
time  will  be  called  its  effective  execution  time.  As  before,  all  processors  will 
be  assumed  to  be  identical  with  P|j  =  l/m.  The  processor  servers  will  be  lumped 
together  as  a  multi-server  station. 

Now,  if  the  number  of  Pc’s  is  n  and  the  number  Mp’s  is  m,  the  state  of  the 
queueing  system  shown  in  Fig.  3.1  can  be  described  by  a  (m+l)-tuple 
(ko;  k|,k7,...,km),  where  k0  is  the  number  of  Pc’s  in  execute  state  and 
k|,l<i<m,  is  the  number  of  Pc’s  queued  for  Mp[i],  Since  all  the  servers 
contribute  fixed  delays  equal  to  tc  all  events(entities  leaving  and  entering 
queues)  occur  at  epochs  separated  by  integer  multiples  of  tc.  In  other  words, 
the  system  behaves  as  if  it  were  clocked  at  intervals  of  tc.  Therefore,  time 
between  significant  events  can  be  considered  to  advance  in  discrete  steps.  Note 
that  an  equivalent  system  in  which  tw’-0,ta’»tc  and  tp’-tc  can  also  be 
represented  by  the  model  shown  in  Fig.  3.1. 

Let  (ko;  k„k7,...,km)  be  a  representative  state  i.e.  it  denotes  all  the 
states  that  can  bo  obtained  by  permuting  (k„k7,...,k„).  Further,  let 
kl,k7,...,km  be  arranged  in  non-increasing  order.  Since  the  number  r  Pc’s  is 
fixed  J>?kj=n.  The  state  space  can  be  divided  into  n+1  sub-spaces  corresponding  to 
integer  values  of  k0  ranging  from  0  to  n.  The  number  of  states  in  the  0-th 
sub-space  is  equal  to  the  number  of  ways  of  partitioning  the  integer  n  into  m 
parts.  In  general,  the  i-th  sub-space  consists  of  states  corres”  jnding  to  the 
partitions  of  n-i  into  m  parts. 
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Initial 


Figure 


State 


Add  1  Pc 


Add  1  more  Fc 


.2  A  typical  enumeration  tree  for  initial  state  2;  2,0, 0,0 
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3.1  Dixcrete  Markov  Chain  Model  for  tp-tw*tc 

Consider  a  state  vector  in  the  i-th  sub-space.  The  value  of  k0  is  i.  Let  d 
denote  the  number  of  non-zero  parts  of  the  memo:y-state-vector  (k 
Then  at  the  end  of  the  cu-rent  memory  cycle  i  Pc’s  make  a  new  request  and  d  Pc’s 
enter  the  execute  state,  i.e.  during  the  next  cycle  the  statu  of  the  system  is 
located  in  the  d-th  sub-space  State  transitions  can  be  described  by  an 
enumeration  tree  similar  to  that  used  in  chapter  2,  Fig.  2.3.  Figure  3.2 
transitions  from  the  state  (2;  2, 0,0,0).  for  a  4x4  multiprocessor  system.  Such 
trees  can  now  be  constructed  for  each  state  and  the  transition  matrix  evaluated. 
In  rig  3.2,  there  is  1  way  of  reaching  (1;  3, 0,0,0),  (3+3*2)  i.e.  9  ways  of 
reaching  (1;  2, 1,0,0)  and  3*2  i.e.  6  ways  of  reaching  (1;  1,1, 1,0).  Once  the 
entire  transition  matrix  is  generated  the  stationary  state  probabilities  can  be 
obtained. 

The  number  of  states  of  this  discrete  Markov  chain  model  increases  faster 
than  the  exact  Markov  chain  model  for  tp=tw,  described  in  chapter  2.  Table  3.1 
compares  the  number  of  states  for  the  two  models.  An  approximate  Markov  chain 
model  will  now  be  proposed.  The  system  behavior  will  be  modified  to  simplify  the 
analysis.  At  the  end  of  each  cycle  the  active  Pc’s  (those  that  are  served  by 
memory  during  the  current  cycle)  will  enter  the  execute  state.  However,  all  the 
queued  Pc’s  will  be  removed  from  the  memory  queues  and  will  be  allowed  to  make 
new  requests  to  memory  aiOng  with  those  Pc’s  that  were  in  execute  state  during 
the  current  cycle.  Let  the  current  state  be  (k^  k„k?,...,km),  with  evactly  d 
Mp  queues  occupied.  Then,  at  the  end  of  the  current  cycle,  d  Pc’s  er*er  the 
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TABLE  3.1 

Comparison  of  the  Number  of  States  for 
Discrete  Markov  Chain  Models  for 
tp-tw  and  tp-tw+tc 

Number  of  States 


n-m 

tp-tu 

tp-tw+tc 

1 

1 

1 

2 

2 

3 

3 

3 

6 

4 

S 

11 

5 

7 

18 

6 

11 

29 

7 

15 

44 

8 

22 

86 

9 

30 

98 

10 

42 

138 

11 

56 

194 

12 

77 

271 
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3.1  Discrete  Markov  Chain  Model  for  tpmtw+tc 

execute  state  und  n-d  Pc’s  make  new  requests.  Thus,  for  this  simplified  model, 
♦  he  state  of  the  system  is  characterized  by  the  number  of  Pc’s  that  make  new 
requests  at  the  beginning  of  a  cycle. 

Now,  if  i  Pc’s  make  a  new  request  to  m  Mp’s,  the  probability  that  Mp[i] 
gets  at  least  one  request  is  1-d-l/m)'.  With  i  Pc’s  accessing  m  Mp’s 
simultaneously,  the  number  of  busy  Mp’s  can  take  any  integer  value  from  1  upto 
min(i,m).  The  probability  of  exactly  j  Mp’s  being  occupied  is  given  by  the  ratio 
of  the  number  of  ways  that  j  Mp’s  can  be  occupied  and  i-j  Mp’s  not  be  occupied 
to  the  tofal  number  of  ways  o'  assigning  i  Pc’s  among  m  Mp’s  i.e.  m1.  In  section 
2.3,  it  was  stated  that  the  numoer  of  ways  that  exactly  j  out  of  m  Mp’s  can  be 
occupied  by  i  Pc  s  is  CM(m,i)*S(i,j).  Now,  if  j  Mp’s  are  occupied  during  the 
current  cycle  then  n-j  Pc’s  make  a  request  to  Mp  during  the  rext  cycle. 
Therefore,  given  that  i  Pc’s  made  a  request  to  Mp  at  the  beginning  of  the 
current  cycle,  the  conditional  probability  that  n-j  Pc’s  will  make  a  new  request 

at  the  beginning  of  the  next  cycle  (which  is  also  the  end  of  the  current  cycle) 
is 

CM(m,i)*S(i,j)/m' 

Note  that  the  above  expression  denotes  the  probability  of  a  transition  from  a 
current  state  i  to  a  next  state  n-j.  The  value  of  j  can  range  from  1  to 
min(i,m). 

For  a  multiprocessor  system  with  n  Pc’s  and  m  Mp’s,  let  k=min(m,n).  The 
number  of  occupied  memories  in  a  cycle  ranges  from  0  to  k.  Therefore,  the  number 
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TABLE  3.2 


Transition  matrix  for  a  4x4  system  with  tp-tw+tc 


*0.0000  0.0000  0.0000  0.0000  0.09375' 

■*o' 

X1 

0.0000  0.0000  0.0000  0.3750  0.5B250 

*1 

x2 

— 

0.0000  0.0000  0.7500  0.5G25  0.32813 

X 

x2 

X3 

0.0000  1.0000  0.2500  0.0B25  0.015B3 

*3 

,X4. 

1.0000  0.0000  0.0000  0.0000  0.00000 

.X4. 

x0  +xx  +x2+x3  +x4  -  1 


- — 


_ 


— .  - 


Fraction  of  n  active 


Fraction  of  m  busy 
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of  Pc’s  that  can  be  assigned  at  the  beginning  of  a  cycle  ranges  from  n-K  to  n. 
The  approximate  mode!  has  k+1  states.  Let  Xn.k,  X^*,,...^  denote  the  steady 
state  probabilities.  Then,  the  execution  rate  is 

i=n-k 

Table  3.2  shows  the  transition  matrix  for  a  4x4  multiprocessor  sytem.  Appendix 
A-5  contains  a  listing  of  a  FORTRAN  program  that  computes  the  steady  state 
probabilities  and  hence  the  execution  rate  for  a  nxm  system.  Figure  3.3  depicts 
plots  of  the  execution  rate  as  a  function  of  m  and  n,  obtained  from  the 
approximate  Markov  chain  model.  Table  3.3  compares  the  analytic  results  with  a 
90%  confidence  interval  obtained  by  a  Monte  Carlo  simulation  of  the  exact  system 

behavior.  The  simulation  consisted  of  10  independent  experiments  of  length  equal 
to  4000  cycles. 

3.1.1  General  Technique  for  Constant  tp«tw+i*tc 

In  general,  if  the  processing  time  is  a  constant  and  equal  to  tw+i*tc,  the 
instruction  timing  diagram  is  as  shown  in  Fig.  3.4a.  In  this  case,  the  execution 
phase  is  i  cycles  long.  This  can  be  modeled  by  an  i-stage  server  shown  in  Fig. 
3.4b.  At  any  given  time  in  the  execute  phase  a  Pc  is  in  one  of  the  i  stages; 
advancing  one  stage  every  cycle.  Now,  the  system  state  can  be  represented  by 
ijft  k„k?,...,km),  where  j|  is  the  number  of  Pc’s  in  the 

execute-stage  I,  and  the  k|’s  denote  the  memory  queue  sizes.  As  before,  let  d 
denote  the  number  of  non-zero  k,’s.  Then,  at  the  beginning  of  the  next  cycle,  d 
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n  x  m 

2x2 

2x4 

2x8 

2  x  1G 

4x2 

4x4 

4x8 

4  x  IB 

8x2 

8x4 

8x8 

IB 

16 

16 

16 


TABLE  3.3 


Average  Number  of  Busy  hp’s 
tp»tw+tc 


90%  confidence  interval 
from  simulation 
(1.0000  ,  1.0000) 

Analytic 
Resul t 

1 . 0000 

(1.0000  ,  1.0000) 

1 . 0000 

(1.0000  ,  1.0000) 

1 . 0000 

(1.0000  ,  1.0000) 

1.0000 

(1.5647  ,  1.5843) 

1.6000 

(1.8194  ,  1.8276) 

1.8276 

(1.9126  ,  1.9244) 

1.9197 

(1.9418  ,  1.9817) 

1.9612 

(1.8251  ,  1.8547) 

1.9692 

(2.8460  ,  2.9206) 

3.0259 

(3.4858  .  3.5346) 

3.5530 

2 

(1.9086  ,  1.9404) 

1.9998 

4 

(3.4936  ,  3.5677) 

3.8772 

8 

(5.4431  ,  5.6254) 

5.9053 

16 

(6.8140  ,  6.9788) 

7.0136 
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Pc’s  enter  execute-stage  1;  j|  Pc’s  make  new  requests  to  Mp  and  rt’s  in  the 
other  execute  stages  advance  to  the  next  higher  execute  stage.  Thus,  for 
example,  if  the  current  state  i$(2,l,0,4;  2, 1,1,0)  then  the  next  partial  state 
is  (3, 2, 1,0;  1,0, 0,0)  and  4  Pc’s  make  new  requests.  The  different  ways  that 
these  Pc's  can  be  assigned  determine  the  k|’s  in  the  next  stato,  which  can  be 
obtained  by  using  an  enumeration  tree  similar  to  the  one  described  earlier. 


3.2  DISCRETE  MARKOV  CHAIN  MODELS  FOR  GEOMETRICALLY  DISTRIBUTED  TP 


For  our  next  model,  let  the  p.ocessing  time  have  a  gee  metric  distribution 
given  by, 

Prob[tp-tw+i*tc]  *■  /2*oi' 
where  /S-l-c < 

Then,  the  mean  processing  time  is  given  by, 

Z  /3*od'*(tw+i*tc) 

i.e.  tw^Z/3*^1  +  /J*tc*Ei*©£1 

iso  l-o 

oO 

i.e.  tw  +  tc*/?*Zi*<*1 

i=o 

i.e.  tw  +  tc*/3*cc/(l-ot)* 
i.e.  tw  +  u*tc/(l-oc) 


Thus  any  mean  value  of  tp  greater  than  tw  can  be  modeled  by  appropriately 
choosing  u.  This  model  is  more  general  than  the  models  of  the  previous  section. 


Processors 
Multiple  Server 


Mp  modules 
Service  time  tc 


Service  time  tc 


Structure  of  the  queueing  model  for  geometrically 
distributed  processing  time. 
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Also,  a  single  model  with  oc  as  a  parameter  handles  all  cases  where  the  mean 
value  of  tp  is  greater  than  tw.  The  geometric  distribution  is  a  discreie  analog 
of  the  exponential  distribution.  The  measurements  reported  in  chapter  5  show 
that  the  processing  time  distribution  is  indeed  close  to  a  shifted  exponential. 

Once  again,  the  representative  state  of  an  exact  discrete  Markov  chain 

model  is  given  by  the  vector  (kc.k„k7,...,k„>.  Also,  £kj«=n  and  k,  is  the  number 

i=0 

of  Pcs  queued  for  Mp[i],  1  <i<m,  and  kt  is  the  number  of  Pc’s  in  execute  state. 

The  reduced  partial  state  at  the  end  of  the  current  cycle  is  given  by  the  vector 

<°»j where  jj  max(0,krl)  for  l<i<m.  Note  that  Eji  -  n-k0-d,  where 

i=l 

d  is  the  number  of  non-zero  kj’s.  Now,  upto  k0+d  Pc’s  are  potentially  available 
to  be  reassigned  to  the  various  queues.  Let  the  next  state  be  denoted  by 
0<J i»l ?*•••»! m>-  Figure  3.5  shows  the  structure  of  the  queueing  model.  Each  Pc 
has  a  probability  of  going  back  into  the  execute  phase.  Since  l0  denotes  the 
number  of  Pc’s  that  are  in  the  execute  phase  during  the  next  cycle,  the  range  of 

l0  is  from  0  to  k0+d.  The  value  of  l0  is  governed  by  the  following  probability 
function, 

/k0+d\ 

Prob{lc=i}=\  i  A  cci*/3ko+d"i  0<i<k0+d 
The  rest  of  the  next  state  vector  can  be  determined  by  using  an  enumeration 
tree.  Figure  3.6  shows  a  topical  enumeration  tree  for  an  initial  state  of  (2- 
2, 0,0,0). 

Once  again,  to  reduce  the  number  of  states  let  us  make  the  following 
approximation.  At  the  end  of  an  Mp  cycle,  those  Pc’s  that  were  in  the  Mp  queues 


■(■aaaua 
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during  the  current  cycle  but  not  serviced  are  removed  from  the  queues.  They  are 
then  reassigned  to  the  Mp  queues.  Now,  the  state  is  characterized  by  the  number 
of  Pc’s  that  make  a  request  i.e.  the  number  of  Pc’s  that  are  queued  during  the 
cycle.  Thus,  the  number  of  states  is  n+1,  viz.  0,l,2,...,n.  Let  the  number  of 
Pc’s  queued  during  the  current  cycle  be  i.  This  means  that  the  other  n-i  Pc’s 
are  exec  ’ing.  If  the  i  requests  at  the  beginning  of  the  current  cycle  result  in 
d(<min(i,m))  busy  Mp’s  during  the  current  cycle,  then  i-d  Pc’s  are  left 
unserviced.  Therefore,  during  the  next  cycle  at  least  i-d  Pc’s  are  queued  for  Mp 
service.  Besides,  each  of  the  other  n-i+d  Pc’s  has  a  probability  /3  of  making  a 
request  to  memory. 


The  probability  that  i-d  Pc’s  are  left  in  the  Mp  queues  at  the  end  of  thw 
current  cycle  is  given  by 


CM(m,d)*$(i,d)/m 1 

The  above  expression  ensues  from  the  fact  that  i  Pc’s  make  random  requests  to  m 
Mp’s  resulting  in  exactly  d  Mp’s  being  occupied.  Also,  the  probability  that  j 

oi't  of  the  other  n-i+d  Pc’s  make  a  request  at  the  beginning  of  the  next  cycle  is 

f  j  .j  n-i+d-i 
\  j  /  *  p,  *  u  J 


Therefore,  the  probability  that  i-d+j  Pc’s  make  a  request  to  Mp  during  the  next 

cycle,  i.e.  probability  that  the  next  state  is  i-d+j  is 

,n-i+dv 

CM(m,d)*S(i,d)/ml  j  /*  *  ^n-i+d-j 

Table  3.4  summarizes  some  of  the  numerical  results  obtained  from  the  approximate 
model.  A  listing  of  the  FORTRAN  program  for  this  model  is  included  in  Appendix 


TABLE  3.4 


Average  number  of  bucy  hp’s 
tp  --  Geometric  Distribution 


£ 

X 

c 

a 

90X  confidence  interval 
from  simulation 

Analytic 
Resu 1 t 

4x2 

0.1 

0.25 

0.5 

0.75 

(0.9130  ,  0.9587) 

1.8478 

1.7849 

1.5423 

0.9408 

4x4 

0.1 

0.25 

0.5 

0.75 

(2.2875  ,  2.3258) 
(1.7382  ,  1.8022) 

2.6079 

2.3679 

1.7920 

0.9739 

■p* 

X 

00 

0.1 

0.25 

n.s 

0.75 

(2.GG07  ,  2.6831) 

3.0773 

2.6824 

1.9012 

0.9877 

8x4 

0.1 

0.25 

0.5 

0.75 

(2.7675  ,  2.8254) 

3.5442 

3.4213 

2.9809 

1.8620 

8x8 

0.1 

0.25 

0.5 

0.75 

(4.3508  .  4.4713) 

(3.4158  ,  3.5118) 

5.0243 

4.5904 

3.5224 

1.9390 

1G  x  8 

0.1 

0.25 

0.5 

0.75 

(5.3855  ,  5.5805) 

6.9461 

6.7058 

5.8650 

3.7050 

1G  x  ]  G 

0.1 

0.25 

0.5 

0.75 

(6.7643  ,  6.9240) 

9.8718 

9.0441 

6.9848 

3.8693 
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A-6  This  model  will  be  used  in  chapter  5  to  predict  the  performance  of  C.mmp 
For  tp>tw  this  model  is  fairly  realistic:  the  process'll  time  is  modeled  as  « 
geometrically  distributed  random  variable  and  the  memory  cycle  time  is  a 
constant. 


3.2.1  Extensions 


The  basic  geometric  model  can  be  used  to  model  systems  in  which  each  Pc  has 
a  private  cache  memory  (Me).  Let  the  access  time  of  the  cache  be  tf  (<tc);  and 
let  x  denote  the  probability  of  making  an  access  to  the  cache.  Therefore,  a 
fraction  (1-x)  of  all  memory  requests  is  to  Mp.  In  the  queueing  model  shown  ir 
Fig.  3.7,  u  denotes  the  probability  that  the  execution  phase  goes  throug1 
another  cycle.  Thus,  /i  =  \-oc  is  the  probability  of  a  Pc  finishing  execution  and 
making  a  request  to  memory.  Thus,  /?,=/?*(  1-x)  and 
As  before, 


Prob{tp=tw+i*tc}  =  /?,*<*'  for  i=0,l,2,„. 

and, 

Prob{tp=tc-tf+i*tc)  =  /??*o c'  for  i*=0,l,2,... 

The  expected  value  of  tp,  given  that  all  accesses  are  to  Mp,  is  given  by 
E[tp  !  Mp]  =  tw  +  tc*/?,*od/(l-oe)* 


Similarly, 


E[  tp  |  Me]  =  tc-tf  + 

Hence,  the  unconditional  expected  value  of  tp  is 
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E[tp]  (l-x)*E[tp  :  Mp]  +  x*E[tp  :  Me] 

-  (l-x)*[tw  +  tc*(l-x)*oc/(l-oc)]  +  x*[tc-tf  +  tc*x*e>c/(i-oc)] 

If  E[tp]  is  Known,  u.  can  be  computed  from  the  above  equation.  This  model  is 
fairly  general  in  that  the  parameters  x  and  E[tp],  tc,  tw  and  tf  can  be  chosen 
arbitrarily,  subject  to  E[tp]  >  tw+x*(tc-tf-tw)  and  tfstc.  The  distribution  of 
tp  is  now  the  sum  of  two  geometries. 

Note  that  this  model,  as  viewed  by  the  Mp,  behaves  exactly  like  the  model 
for  geometrically  distributed  tp  shown  in  Fig.  3.5.  The  ft  in  Fig.  3.5  is 
equivalent  to  the  /?,  in  fig  3.7,  and  the  o c  in  Fig.  3.5  is  equivalent  to  the  cC+/?2 
in  fig  3.7.  Let  R  denote  the  average  number  of  busy  Mp’s  for  the  model  of  Fig. 
3.5,  but  with  ft-filt  and  <*-1-/?,.  The  values  of  tw  and  tc  is  kept  unchanged.  Now, 
in  the  system  with  the  cache,  the  average  number  of  busy  Mp’s  is  also  R.  Note 
that  the  expected  value  of  tp  is  different  for  the  two  cases.  Now,  by 
definition,  for  every  1-x  accesses  to  Mp  there  are  x  accesses  to  Me.  Consider  an 
interval  equal  to  T  Mp  cycles.  During  this  interval  there  are  R*T  busy  Mp 
cycles.  Hence,  there  are  R*T*x/(l-x)  accesses  to  Me.  Therefore,  the  number  of 
instructions  executed  in  one  Mp  cycle  is 
R  +  R*x/(l-x) 
i.e.  R/Cl-x) 

Thus,  the  average  number  of  equivalent  Mp  cycles  is  R/(l-x). 

For  example,  let  us  compute  the  average  number  of  effective  busy  Mp  cycles 
for  ta=tw*5,  tc«10,  tf«l,  x*0.75,  E[tp]»15,  and  n«m=8.  The  equation  for  E[tp]  in 


cache) 


.8  Structure  of  McCredie's  Memory  Inter feience  Model 
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terms  of  oc  gives  *=28/53.  Hence  /31=(l-x)*(l-*)=0.25*25/53.  Using  /3=0.25*25/53 
for  the  model  of  section  3.2,  we  get  R=0.9371.  Hence,  the  average  number  of 
effective  busy  Mp  cycles  is  0.9371/0.25  =  3.7484. 


3.3  McCREDIE’s  EXPONENTIAL  SERVER  MODEL 


As  mentioned  in  Chapter  2,  if  the  memory  cycle  time  and  the  processing  time 
are  assumed  to  be  exponential,  the  queueing  results  of  Jackson[JackJ63]  can  be 
used.  The  structure  of  McCredie’s  memory  interference  model[McCr73]  is  depicted 
in  Fig.  3.8.  In  terms  of  the  notation  used  in  this  chapter,  \**l/(tp-tw)  (if 
oc=0),u=l/tc  and  *  is  the  probability  of  accessing  a  cache  memory  whose  access 
time  is  assumed  to  be  negligible.  The  time  from  the  completion  of  one  reference 
to  main  memory  until  the  next  access  is  exponentially  distributed  with  mean 
l/(X*/3).  The  model  allows  the  first  Mp  module  to  have  a  different  cycle  time 
1/v.  The  probability  that  a  request  to  main  memory  is  to  the  first  Mp  module  is 
f;  the  requests  to  the  other  Mp  modules  being  uniformly  distributed.  Let  k  be 
the  number  of  Pc’s  queued.  Using  Jackson’s  formulae  and  some  clever  grouping  of 
terms'!"  the  execution  rate  is  obtained  as 

!l  (n-k)*\*P(k) 
k=0 


+The  reader  is  referred  to  McCr73  for  the  details  of  the  derivation. 
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where  P(k)  - 


(n-k)l  ±  H(k).T(k) 


and  W(k)  -  *  n!/(n-k)! 


and  T(k)  -  Z  (-?*  i 

J-O  J 


Figure  3.9  shows  some  of  the  results  predicted  by  the  model.  The  advantage  of 
this  model  is  that  it  allows  a  cache  memory  and  an  Mp  module  with  a  different 

speed  and  a  different  access  probability.  Note  that  with  the  presence  of  the 
cache  X^l/(tp-tw);  but  \*>l/(tp-^*tw). 


3.4  STRECKER’S  ANALYSIS 

This  section  will  briefly  review  Strecker’s  analysis  of  multiprocessors 
with  tp>tw[StreW70],  The  processing  time  is  assumed  to  be  a  constant.  The  number 
of  processors  queued  is,  in  general,  less  than  n.  The  probability  a  Pc  is  queued 
is  denoted  by  p*  Then,  assuming  a  binomial  distribution  for  the  queued 
processors,  the  execution  rate  is 

m*[l-(l-p„/m)n] 

Now,  because  the  number  of  Pc’s  queued  is  binomially  distributed,  the  average 
number  of  Pc’s  queued  is  n*pla.  Hence, 
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Pm  -  average  number  of  Pc’s  queued/n 
Strecker’s  flow  diagram  of  the  instruction  execution  is  shown  below. 


Strecker  states  that  the  average  number  of  Pc’s  not  queued  is  the  product  of  the 
average  unit  execution  rate  and  the  effective  Pc  delay  tp-twt.  Thus, 

Pm  =  1  -  (m/n/tc)  *  (l-(l-pB/m)n)  *  (tp  -  tw) 
which  is  a  n-th  order  polynomial  equation  in  pm  which  has  only  one  solution  lor 
pm  m  the  interval  (0,1).  This  value  of  p„  is  used  in  the  earlier  expression  for 

the  execution  rate.  This  model  is  fairly  simple  and  good  for  larger  values  of 
tp. 


tThis  follows  directly  from  Little’s  formula,  L=\*W  [cf.  LittJ61]. 
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3  5  THE  EFFECT  OF  CACHE  ON  SYSTEMS  WITH  tp>tw 


One  of  the  techniques  used  to  increase  the  execution  rate  of  uniprocessor 
systems  involves  the  use  of  a  fast  cache  memory,  Me.  In  multiprocessor  systems, 
the  cac!'?  not  only  provides  a  memory  with  a  smaller  access  time  but  also  reduces 
the  traffic  through  the  crosspoint  switch.  This  reduces  the  memory  interference 
and  reduces  the  amount  of  time  the  processor  spends  waiting  for  Mp  service.  In 
fact,  the  use  of  private  caches  for  the  Pc’s  is  being  considered  for  CMU’s 
C.mmp. 

In  this  section,  we  shall  characterize  the  processing  time  by  a  single 
constant  value  tp;  with  Pjj  =  l/m.  Let  the  cache  have  an  access  time  tf  (<ta)  and 
a  rewrite  time  tr  (<tw).  Since  tp>tw  an.,  tr<tw,  the  cache  always  recovers  before 
the  Pc  can  make  its  next  request.  Let  oi  be  the  probability  of  accessing  Me  i.e. 
the  probability  that  the  cache  contains  the  information  needed  by  the  Pc.  The 
probability  of  accessing  Mp  is 

Now,  the  time  needed  to  execute  one  unit  instruction  out  of  cache  is 
Wc=tp+tf.  Let  Wm  be  the  average  time  needed  to  execute  one  unit  instruction  out 
o*  Mp.  Hence,  the  average  time  needed  to  execute  one  unit  instruction  is 
Wavg  =  oi*Wc  +  /!*Wm 
Therefore,  the  execution  rate  is  n/Wavg 


The  problem  now  reduces  to  evaluating  Wm,  Let  us  focus  our  attention  on  a 
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single  Mp  unit.  Further,  let  us  assume  that  the  number  of  queued  Pc’s  for  that 
Mp  unit  follows  the  binomial  distribution;  and  let  pm  be  probability  that  a  Pc 
is  queued  for  one  of  the  Mp’s.  Hence,  the  probability  of  being  queued  for  the  Mp 
unit  under  consideration  is  p„/m,  since  all  Mp’s  have  the  same  speed  and 
P|jml/m-  From  the  binomial  distribution  for  the  queued  Pc’s,  it  follows  that,  L, 
the  average  number  of  Pc’s  queued  for  the  Mp  is  n*pm/m.  The  rate  X  at  which  Pc’s 
are  served  by  this  Mp  is  rn*[l-(l-pm/rn)n]/tc.  Using  Little’s  formula,  L«X*W,  we 
obtain  the  average  waiting  time  for  an  Mp  as 
n*pjm  *  tc  /  [l-(i-pm/m)n] 

Therefore  Wm,  the  average  time  for  one  instruction  out  of  Mp,  is 
tp-tw  +  n*pm/rn  *  tc  /  [l-(i-p„,/m)"3 

Hence, 


Wavg  -  <**(tp-tw)  +  /3*{n*pm/rn  *  tc  /  [  1 -( 1 -pm/m)n]} 


Now,  the  only  undetermined  quantity  is  pffl.  In  T  Mp  cycles  the  total  number 
of  busy  Mp  cycles  is 

T,  *  m  *  [1  -( 1  -Pm/m)n]  *  T 
Hence,  the  total  number  of  busy  cache  cycles  is 

T?  -  ot/fi  *{m  *  [l-(l-pm/m>n]  *  T} 

and  the  total  number  of  unit  instructions  is  the  sum  of  the  above  two 
expressions,  i.e.  T,+T?.  Therefore,  the  unit  execution  rate  is 
m/(/3*tc)  *  [l-(l-pm/m^] 

Recall  that  we  had  earlier  found  the  execution  rate  to  be  n/ Wavg.  Equating  the 
two  and  rearranging  the  terms  we  get 
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Pm  -  1  -  m/(n*/3*tc)  *  [o£*(tp+tf)  +  /?*(tp-tw)]  *  [  1 -( 1 -pM/m)n] 

The  above  equation  can  be  solved  iteratively  for  p.  A  FORTRAN  program  that 
computes  the  execution  rate  for  this  model  is  listed  in  Appendix  A-7.  Figure 
3.10  shows  the  execution  rate  of  an  8x8  and  a  16x16  multiprocessor  system  for 
varios  values  of  the  other  parameters.  Some  simulation  confidence  intervals  are 
also  depicted.  Note  that  with  <*>0,  this  model  yields  the  same  results  as 
StrecKer’s  model  described  in  Section  3.4.  The  major  advantage  of  this  model  is 
that  it  allows  the  cache  speed  to  be  a  control  parameter  of  the  analysis. 

3.6  CONCLUDING  REMARKS 


In  general,  if  the  processing  time  is  greater  than  the  memory  rewrite  time, 
then  in  the  absence  of  memo'y  contention  one  instruction  is  executed  by  each  Pc 
every  ta+tp  time  units.  Thus,  the  theoretical  maximum  value  of  the  average 
number  of  busy  Mp’s  is  n*tc/(ta+tp).  Hence,  if  m>n,  even  in  the  absence  of 
memory  interference,  the  Mp’s  will  have  idle  periods.  Comparing  Fig.  2.11  and 
Fig.  3.3a,  the  curves  for  tp>tw  reach  their  asymptotic  maximum  for  smaller 
values  of  m.  Systems  with  tp>tw  are  generally  processor  speed  limited  and 
relatively  small  performance  improvement  will  be  obtained  if  the  memory  speed  is 


increased. 


Probability  of  accessing  cache 


Figure  3.10a  The  Effect  of  Cache  on  a  8x8  system 


! 
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CHAPTER  4 

MULTIPROCESSOR  SYSTEMS  WITH  TP<TW 


In  this  chapter,  an  approximate  model  is  proposed  for  multiprocessor 
systems  with  tp<tw  and  P|j-l/m.  The  processing  time  will  be  assumed  to  be  a 
constant.  In  this  case,  a  classical  queueing  model  is  made  difficult  due  to  the 
following  reason.  In  conventional  queueing  models,  the  service  center  can  start 
serving  a  new  customer  as  soon  as  the  last  customer  leaves.  For  core  memories, 
due  to  the  rewrite  time,  the  memory  has  to  wait.  However,  for  tp£tw,  this 
problem  can  be  surmounted  by  delaying  the  Pc  in  tha  Mp  queue  for  an  additional 
time  equal  to  tw  and  reducing  the  effective  execution  time  by  the  same  amount. 
Hence,  with  this  modification,  the  Mp  can  start  serving  the  next  Pc  in  its  queue 
when  the  current  Pc  leaves  the  queue.  But,  with  tp<tw,  the  Pc  can  make  a  new 
request  before  the  memory  that  served  it  last  recovsrs.  Strictly  speaking,  a 
discrete  Markov  chain  model  can  be  formulated  by  maklrg  the  basic  time  Interval 
equal  to  the  highest  common  factor  of  tp.tw,  and  ta.  However,  the  number  of 
possible  epochs  in  a  memory  cycle  and  the  size  of  the  state  vector  and  the 
number  of  states  is  very  large. 
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4.1  AN  APPROXIMATE  MODEL  FOR  TP<TW 


The  results  of  the  discrete  Markov  chain  model  of  section  2  2  will  be  used 
to  obtain  an  approximate  of  the  execution  rate.  Consider  an  Mp  cycle  in  which  i 
Mp’s  are  initially  busy.  If  tp=tw,  the  i  active  Pc’s  make  a  request  at  the  end 
of  the  cycle.  However,  if  tp<tw,  an  active  Pc  makes  a  new  request  before  the  Mp 
that  served  it  recovers.  If  this  request  is  made  to  an  Mp  that  is  not  busy,  the 
new  request  can  be  served  immediately.  Consider  an  Mp  cycle  in  which  i  Mp’s  are 
initially  busy.  During  this  cycle  i  Pc’s  are  initially  active.  Now,  these  i  Pc’s 
can  receive  more  service  if  they  make  their  next  request  to  the  m-i  idle  Mp’s. 
Now,  if  this  request  is  made  to  an  idle  Mp,  the  effective  service  time  (not 
including  waiting  time)  for  the  active  Pc  is  ta+tp.  However,  if  this  request  is 
made  to  a  busy  Mp  there  is  no  increase  in  the  execution  rate  due  to  the  fact 
that  tp<tw.  Hence,  the  effective  service  time  is  tc. 

let  bj  denote  the  probability  that  i  Mp’s  are  busy,  obtained  from  the  exact 
discrete  Markov  chain  model  for  tp=tw.  Now,  the  average  number  of  idle  Mp’s  that 
receive  the  next  request  is  given  by 
(m-i)  *  ([1-  (1-1/m)') 

Thus,  the  probability  that  an  active  Pc  gets  serviced  by  an  idle  Mp  is 
(m-i)  *  [1  -  (l-l/m)n]  /  i 

Note  that  the  above  is  a  conditional  probability  based  on  the  fact  that  i  Pc’s 
are  active.  Therefore,  the  unconditional  probability  that  the  effective  service 
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TABLE  4.1 


Average  Number  of  Busy  tip’s 
tc-10  ta-tu-S 


n  x  m 

tp-5 

tp=4 

2x2 

1.5000 

1.5385 

2x4 

1 . 7506 

1.8451 

2x8 

1.8750 

2.0215 

2  x  16 

1.9375 

2.1182 

4x2 

1 . 7500 

1.7722 

4x4 

2.G210 

2.G947 

4x8 

3.2G52 

3.4429 

4  x  1G 

3.G2G8 

3.9051 

8x2 

1 . 8750 

1.88G8 

8x4 

3.2G57 

3.3148 

8x8 

4.9471 

5.1023 

8  x  1G 

G. 3149 

G.G579 

1G  x  2 

1.9375 

1.943G 

1G  x  4 

3.G270 

3.G538 

1G  x  8 

G. 3154 

G.41G3 

1G  x  1G 

9.G258 

9.9299 

tp-3 

tp=2 

tp-1 

1.5790 

1.6216 

1.6667 

1.9512 

2.0702 

2.2047 

2.1928 

2.3958 

2.G403 

2.33G2 

2.G041 

2.9414 

1.7949 

1.8182 

1.8421 

2.7832 

2.8721 

2.9GG9 

3.6413 

3.8G33 

4.1145 

4.2297 

4.G131 

5.0729 

1.8987 

1.9108 

1.9231 

3.3G54 

3.417G 

3.4714 

5.2G7G 

5.4439 

5.G324 

7.0403 

7.4692 

7.9539 

1.9497 

1 . 9558 

1 .9620 

3.G811 

3.7088 

3.73G9 

G.5204 

G.G280 

G.7392 

10.254 

10.G00 

10.9G9 

tp-0 

1.7143 

2.3579 

2.9403 

3.3792 

1.8GG7 

3.0G81 

4.4401 

5.G345 

1.9355 

3.52G9 

5.8345 

8.5057 

I. 9G83 
3.7G54 
G.8543 

II. 366 
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time  of  an  active  Pc  is  ta+tp  is  given  by 

£  b,  *  (m-i)  *  [1  -  {l-l/m)n]  /  i 

Let  f  be  the  value  of  the  above  expression.  The  number  of  active  Pc’s  is 
obtained  from  the  discrete  Markov  chain  model  for  tp=tw;  denoted  b'  X.  The 
expected  value  of  the  service  time  for  an  active  Pc  is 
f  *  (ta+tp)  +  (1-f)  *  tc 

Hence,  the  execution  ra‘e,  expressed  as  unit  instructions  per  second,  is  given 
by 

X  /  [f*(ta+»p)  +  <l-f)*tc] 

The  average  number  of  busy  Mp  cycles  can  be  obtained  by  multiplying  the  above 
expression  by  tc. 

Table  4.1  presents  the  results  obtained  for  various  values  of  m,  n,  ta,  tp, 
and  tc.  Since  this  model  is  approximate  it  is  meaningful  to  compare  its  results 
with  a  confidence  interval  obtained  from  simulation.  Table  4.2  compares  the 
results  of  this  model  and  Strecker’s  model[StreW70]  with  simulation  results.  The 
results  of  this  model  are  within  57  of  ttn  simulation.  Figures  4.1  and  4.2 
illustrate  the  effects  of  n  and  m  on  the  MpAR. 

4.2  STRECKER’S  MODEL 


Strecker[StreW70]  uses  his  approximate  model  for  tp-tw  to  analyze  tp<tw.  He 


Effective  MpAR 


Effective  MpAR 
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defines  the  probability  of  an  active  Pc  making  a  request  to  an  occupied  Mp  as 
p(occ)  =  average  number  of  occupied  Mp’s  /  m 
The  probability  of  a  request  to  an  u.  occupied  Mp  is 
p(unocc)  =  1  -  p(occ) 

Using  his  model  for  tp=tw, 
p(occ)  =  1  -  ( 1  -  l/m)n 

Thus,  he  average  amount  of  time  required  to  execute  an  instruction  is 
E[t]  =  p(occ)*tc  +  p(unocc)*(ta+tp) 

**  tc  +  (l-l/m)n  *  (tp-tw) 

Therefore,  the  average  number  of  busy  Mp  cycles  is 
tc  *  m*[l-(l-l/m)n]  /  E[t] 

Note  that  Strecker  assumes  a  constant  probability  for  an  Mp  being  occupied. 
In  the  model  of  Sec.  4.1  the  probabilty  of  an  Mp  being  occupied  depends  on  the 
number  of  busy  Mp’s  during  the  cycie.  Simulation  results  summarized  in  Table  4.2 
show  that  the  model  proposed  in  section  4.1  is  better  than  Strecker’s  model. 

4.3  CONCLUDING  REMARKS 

With  one  Pc  and  one  Mp  there  is  no  advantage  gained  by  the  fact  that  tp<tw. 
However,  if  an  active  Pc  can  make  its  next  request  to  an  unoccupied  Mp 
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TABLE  A, 2a 


Average  number  of  busy  Mp*s 

Comparison  of  Analytic  Models  and  Simulation  Results 
ta  »  tu  -  5  tp  -  1  tc  -  10 


n  x  m 

Analytic  Mode 
(Section  4.1) 

2x2 

1 .6567 

2x4 

2.2047 

2x4 

2.6403 

2x8 

2.6403 

2  x  16 

2.9414 

4x2 

1.8421 

4x4 

2.9669 

4x8 

4.1145 

4  x  16 

5.0729 

8x2 

1.9231 

8x4 

3.4714 

8x8 

5.6324 

8  x  16 

7.9539 

16  x  2 

1.9620 

16  x  4 

3.7369 

16  x  8 

G. 7392 

1G  x  1G 

10.9G9 

90%  confidence  interval 
from  simulation 

(1.SS91  ,  1.6053) 
(2.1342  ,  2.2239) 
(2.6039  ,  2.6421) 
(2.6039  ,  2.6421) 
(2.9240  ,  2.9858) 

(1.7746  ,  1.7882) 
(2.8102  ,  2.9072) 
(3.9982  ,  4.1092) 
(5.0587  ,  5.1074) 

(1.8514  ,  1.9131) 
(3.2850  ,  3.3968) 
(5,3417  ,  5.5750) 
(7.733G  ,  7.9873) 

(1.9225  ,  1.9551) 
(3.6047  ,  3.7187) 
(6.4601  ,  6.5891) 
(10.5279,10.7559) 


Strecker *  3 
Mode  I 

1 . 6667 
2.2531 

2.7027 

2.7027 
2.9880 

1.9231 

3.1306 

4.3245 

5.2682 

1 . 9953 
3.7497 
G.0879 
8.4755 

2.0000 

3.9758 

7.4052 

12.014 
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TABLE  4.2b 


Average  number  of  busy  Mp’s 


Comparison  of  Analytic  Models  and  Simulation  Results 
ta  «  tu  -  5  tp  -  3  tc  -  10 


n  x  m 

Analytic  Model 

90K  confidence  interval 

Strecker *  s 

(Section  4.1) 

from  si  mu 

1  a  t  i  on 

Model 

2x2 

1.S790 

(1.5368 

1 

1.5457) 

1.5789 

2x4 

1.9512 

(1.9058 

• 

2.0012) 

1.9718 

2x8 

2.1928 

(2.2019 

1 

2.2059) 

2.2140 

2  x  IB 

2.3362 

(2.3441 

l 

2.3496) 

2.3507 

4x2 

1 . 7949 

(1.7625 

l 

1.7767) 

1.8987 

4x4 

2.7832 

(2.6950 

• 

2.8028) 

2.9191 

4x8 

3.6410 

(3.6021 

f 

3.6797) 

3.7502 

4  x  IB 

4.2297 

(4.2319 

t 

4.2863) 

4.3056 

8x2 

1.8987 

(1.8530 

l 

1.9054) 

1.9937 

8x4 

3.3654 

(3.2646 

t 

3.3402) 

3.6731 

8x8 

5.2676 

(5.1470 

l 

5.2406) 

5.6386 

8  x  16 

7.0403 

(6.9457 

♦ 

7.1531) 

7.3289 

IS  x  2 

1.9497 

(1.9212 

l 

1.9551) 

2.  0000 

IB  x  4 

3.6811 

(3.5241 

t 

3.7189) 

3.9679 

IB  x  8 

6.5204 

(6.3928 

V 

6.4759) 

7.2261 

16  x  16 

10.254 

(9.9522 

.10.350) 

11.093 

- - - - ..  - - - - ^ 
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significant  increases  in  the  execution  rate  can  be  obtained.  The  theoretical 
maximum  execution  rate  for  each  Pc  is  l/tta+tp);  the  average  number  of  busy  Mp 
cycles  is  limited  by  n*tc/(ta+tp),  if  m  is  large  enough;  else  it  is  bound  by  m. 
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CHAPTER  5 

EMPIRICAL  MEASUREMENTS,  PARAMETER  ESTIMATION  AND  MODEL  VALIDATION 


An  important  aspect  in  the  development  of  mathematical  models  is  the 
modeling  process:  the  abstraction  of  the  complex  physical  process  to  a  model 
that  is  mathematically  tractable.  This  chapter  is  devoted  to  validating  the 
mathematical  model. 

One  of  the  main  parameters  of  the  models  developed  in  this  thesis  is  the 
processing  time.  In  order  to  evaluate  the  probability  distribution  and  the  mean 
of  the  processing  time,  measurements  were  made  of  the  dynamic  usage  of  the 
PDP-11/20  instruction  set.  B.  Aygun’s  Dynamic  Analysis  and  Measurement 
/Environment  [AyguB73]  was  used  to  simulate  the  execution  of  over  34,500  PDP-11 
instructions  in  carefully  selected  main-loop  portions  of  four  programs.  The 
PDP-11  Processor  Handbook,  Interface  Manual  and  Engineering  Drawings  were  used 
to  obtain  the  instruction  timing  for  the  various  instructions  and  addressing 
modes.  This  information  can  be  used  to  predict  the  performance  of  a 
multiprocessor  system  like  C.mmp,  which  uses  the  PDP-11/20  as  the  processor. 
Note  that  the  analytic  models  in  this  thesis  are  general;  C.mmp  is  used  in  this 
chapter  as  a  typical  case  for  validation  and  illustration  of  the  use  of  these 


models. 
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5.1  PDP-11/20  OVERVIEW 

A  brief  discussion  of  the  instruction  timing  and  formats  is  presented  here. 

The  PDP- 11/20  processor  has  five  major  states  :  fetch,  source,  destination, 
execute  and  service.  The  first  four  states  are  used  during  normal  operation; 
service  is  used  during  special  operations,  such  as  traps  and  interrupts. 

Fetch:  locates  and  decodes  an  instruction.  When  fetch  is  completed,  the 
processor  enters  another  major  state,  depending  on  the  type  of  instruction 
decoded.  It  is  possible  to  go  from  fetch  to  any  other  state,  including  back 
to  fetch.  Every  instruction  starts  by  first  entering  the  fetch  state. 

Source:  decodes  the  source  field  of  a  double-operand  instruction  and 
transfers  the  source  operand  to  the  appropriate  location.  The  source  major 
state  is  entered  only  if  the  instruction  is  a  double-operand  type. 

Destination:  decodes  the  destination  field  of  the  appropriate  instruction. 
Destination  fields  are  present  in  both  single  ano  double-operand 
instructions.  Destination  operand  is  accessed  and  transferred  to  the 
appropriate  location. 


Execute:  uses  the  data  obtained  during  previous  major  states  to  perform  the 
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**  *  *»# 


Destination  Address 

*  -  Specifies  Direct  or  Indirect  Address 
**  =  Specifies  How  Register  will  be  used 
***  «  Specifies  One  of  8  General  Purpose  Registers 

Figure  5*la  Single  Operand  Instruction  Format 


**  *  *** 

* 

** 

* 

*♦* 

OP 

CODE  MODE  i  @  Rn 

— i - 1 - - »  !  i  t 

- 1 - 

MODE  !  9 

t  1 

Rn 

l  • 

15 

12  11  10  9  b  6 

»  5 

4  3 

2  0 

Source  Address 
Destination  Address 

*  -  Direct/Deferred  Bit  for  Source  and  Destination  Address 
**  a  Specifies  How  Selected  Registers  are  to  be  used 
***  -  Specifies  a  General  Register 

Figure  5.1b  Double  Operand  Instruction  Format 
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specified  operation.  During  this  state  arithmetic  operations,  logic 
functions,  and  tests  are  performed,  and  the  destination  location  is  updated 
if  required. 

Service:  used  to  execute  special  operations,  such  as  interrupts,  trap,  etc. 

Although  the  major  states  follow  the  sequence  of  fetch,  source,  destination, 
execute,  and  service,  not  all  major  states  are  required  for  every  instruction. 

The  processor  enters  only  those  major  states  necessary  to  execute  the  current 
instruction.  The  minimum  sequence  is  from  a  fetch  of  one  instruction  directly  to 
the  fetch  of  the  next  instruction.  The  maximum  sequence  is  fetch,  source, 
destination,  execute,  service  and  back  to  fetch.  The  Interface  Manual  contains 
more  detailed  information  about  the  states  needed  for  various  instructions. 

The  instruction  format  for  all  single  operand  instructions  (such  as  clear, 
increment,  test)  is  shown  in  Fig.  5.1a.  Operations  that  imply  two  operands  (such 
as  add,  subtract,  move  and  compare)  are  handled  by  instructions  that  specify  two 
addresses.  The  first  operand  is  called  the  source  operand,  the  second  the 
destination  operand.  Bit  assignments  in  the  source  and  destination  address 
fields  may  specify  different  modes  and  different  general  registers.  The 
instruction  format  for  the  double  operand  instruction  is  depicted  in  fig  5.1b. 

Table  5.1a  summarizes  the  four  basic  modes  used  with  direct  addressing;  the  four 
basic  modes  used  with  deferred  addressing  are  described  in  Table  5.1b.  The 
PDP-11  Processor  Handbook  contains  numerous  illustrative  examples. 
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TABLE  5.1a 


Direct  Addressing  Modes  of  PDP-11 


Binary 

Code 

Name 

Assembler 

Syntax 

Funct ion 

0  0  0 

Register 

Rn 

Register  contains  operand. 

0  1  0 

Auto  increment 

(Rn)  + 

Register  is  used  as  a 
pointer  to  sequential 
data  then  incremented. 

1  0  0 

Autodecrement 

-(Rn) 

Register  is  decremented 
then  used  as  a  pointer. 

1  1  0 

I  ndex 

X(Rn) 

Value  X  is  added  to  (Rn) 
to  produce  address  of  the 

operand.  Neither  X  nor 
(Rn)  is  modified. 


.  .  —  ... ....  .... 
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TABLE  5.1b 

Deferred  or  Indirect  Addressing  Modes  of  PDP-11 


Binary 

Code 

Name 

Assembler 

Syntax 

Function 

0  0  1 

Register 

Deferred 

•Rn  or  (Rn) 

Register  contains  the 
address  of  the  operand. 

0  1  1 

Autoincrement 

Deferred 

®(Rn)  + 

Register  is  first  used 
as  a  pointer  to  a  word, 
then  incremented  by  2. 

1  0  1 

Autodecrement 

Deferred 

a- (Rn) 

Register  is  decremented 
by  two  and  then  used  as 

a  pointer  to  a  word 
containing  the  address 
of  the  operand. 


101  Index  eX(Rn)  Value  Xlstored  in  a 

Deferred  word  following  the 

instruction)  and  (Rn) 
are  added  and  the  sum  is 
used  as  a  pointer  to  the 
word  containing  the 
address  of  the  operand. 
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5.2  VALIDATION  OF  UNIT  INSTRUCTION  CONCEPT  AND  ESTIMATION  OF  TP 


In  this  thesis,  processor  behavior  has  been  modeled  as  an  ordered  sequence 
consisting  of  a  memory  request  followed  by  some  processing.  A  study  of  the 
PDP-11/20  instruction  timing  shown  here  exhibits  a  similar  behavior.The  access 
time  starts  when  the  Mp  receives  a  request  and  ends  when  the  data  is  received  by 
the  Pc. 

Each  PDP-11/20  instruction  can  be  broken  down  into  sequences  of  unit 
instructions.  Figure  Z.2  shows  the  instruction  timing  for  the  various  addressing 
modes  and  instruction  types.  The  PDP-11  Processor  Handbook  lists  the  total  time 
for  executing  the  various  instructions.  The  access  time  is  assumed  to  be  450  ns. 
Figure  5.2  is  consistent  with  the  handbook.  Table  5.2  summarizes  the 
contribution  of  the  different  cases  to  the  effective  processing  time.  Note  that 
these  figures  take  into  account  the  delay  due  to  the  PDP-11  Unibus.  Table  5.3 
contains  the  instruction  mix  obtained  by  using  Aygun’s  D/1MFA.  Table  5.4  gives 
the  effective  processing  times  and  their  relative  frequencies;  the  cumulative 
probability  distribution  funct  l  is  plotted  in  Fig,  5.3.  The  access  time  of  the 
memory  was  assumed  to  be  450ns  and  the  basic  clock  period  of  the  PDP-11/20 
processor  was  taken  to  be  its  nominal  value  of  140ns.  The  average  processing 

fhe  author  acknowledges  B.  Aygun’s  assistance  in  obtaining  the  instruction  mix. 
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Single  Operand  Instruction:;  &  Double  Operand  Instructions  with  srcmode.O 
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Figure  5.2  PDP-ll/20  Instruction  Timing 
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Doublo  Operand  Instruction?  with  oremode/O 
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Fig.  5-2  (contd.) 
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TABLE  5.2 


Effective  Processing  Time 
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ercmode*0  a  dstmode*© 
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TABLE  5.3 


Instruction  Mix 


Single  Operand  Instructions 
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TABLE  5.4 


Relative  Frequency  Distribution  of 


the  Effective  Processing  Time 


Value  in  ns. 
550 
700 
840 
910 
970 
1050 
1190 
1260 
1470 
1610 
1820 
2100 


Frequency 
14.49  X 
10.20  X 
14.49  X 
11.43  X 
1.53  X 
1.53  X 
11.93  X 
5.61  X 
1.53  X 
7.35  X 
11.84  X 
8.16  X 


Average  Value  =  1150  ns. 
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Figure  5.3  Cumulative  probability  distribution  of  the  effective  processing  time  of  PDP-ll/20 
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time  obtained  from  this  analysis  is  1150  ns.  The  dotted  curve  in  Fig.  5.3  is  a 
fhifted  exponential  distribution  given  by 

Prob{tp<x}  =  1  -  exp[-(x-450)/700]  for  x>450 

=  0  for  x<450 

5.3  MODEL  VALIDATION  VIA  C.mmp  PERFORMANCE  EVALUATION 

In  this  section  the  models  are  used  to  predict  the  performance  of  C.mmp  and 
compared  with  actual  measurements.  Figure  5.4  shews  the  effect  of  D.map  and  the 
crosspoint  switch  on  the  parameters  tp,  ta  and  tw.  The  access  time  of  the  C.mmp 
core  memory  is  250  ns.  However,  a  200  ns  nominal  switch  and  memory  control  delay 
yields  an  effective  ta  of  450  ns.  Each  Mp  is  8-way  interleaved.  Thus  the  rewrite 
time  of  400ns  is  overlapped  with  the  next  access  if  the  next  access  is  to  one  of 
the  other  7  submodule  of  the  Mp-module.  Assuming  random  accessing  within  a 
module  the  average  rewrite  time  experienced  is  only  50  ns.  The  effective  average 
processing  time  is  1200  ns;  the  50ns  delay  associated  with  the  relocation 
registers  D.map  is  added  to  the  basic  value  of  1150  ns. 

The  following  experimenttt  was  conducted  on  the  partial  realization  of 
C.mmp  available  to  date.  A  program  was  loaded  into  one  Mp-module  and  the  three 


TIC.Pierson  and  W.Broadley  were  instrumental  in  the  experimental  set-up. 
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1 

Pc  receives  data  fron  Mp 

2 

Pc  finishes  operating  on  the  data 

3 

Pc  has  address  of  new  data 

4 

Dmap  computes  physical  address  of  data  and 
puts  it  on  the  Pc-Mp  bus. 

5 

Memory  has  received  request  at  the  crosspoint 

switch 

6 

Memory  controller  (part  of  crosspoint  switch) 
the  request  to  be  served  and  sets  up  switch. 

selects 

7 

Data  read  from  storage  location 

8 

Data  sent  through  switch 

9 

Memory  recovers  and  starts  serving  next  request  (if  any) 

Figure  5.4  Unit  instruction  timing  diagram  for  C.mmp 
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5.3  Model  V alidation  via  C.mmp  Performance  Evaluation 

available  processors  executed  the  code  individually,  in  pairs  of  two,  and 
co  lectively.  The  number  of  memory  cycles  was  measured.  Table  5.5  presents  the 
results  of  the  experiment.  The  analytic  results  predicted  by  the  geometric  model 
of  section  5.2  are  depicted  in  Table  5.6;  simulation  results  with  processing 
time  having  the  shifted  exponential  distribution  of  Fig.  5.3  are  also  listed. 

The  analytic  results  are  very  close  to  the  measured  performance.  However,  for 
the  3x1  multiprocessor  case  the  analytic  model  is  about  107.  higher  than  the 
measured  value.  This  is  due  to  the  read-modify-write  cycles  which  make  the  Mp 
service  time  greater  than  tc.  The  effect  of  these  read-modify-write  cycles  is 
not  crucial  when  the  interference  is  not  excessive.  The  simulation  ind  analytic 
results  show  that  the  performance  can  be  predicted  by  simple  mathematical  models 
with  reasonable  accuracy. 


TABLE  5.5 


Summary  of  Measurements  on  C.mmp 
Number  of  Mp  accesses  per  second 
(Mi  1 1  ions/sec) 


Pc  (A3 
0 
0 
1 
0 
1 
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PctB] 
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1 

0 
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0 
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Pc  (□ 
1 
0 
0 
1 
1 
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Mp  Access  Rat 
0.G2015 
0.61805 
0.613657 
1.14899 
1.14672 
1.14657 
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1.42466 
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TABLE  5.6 


Analytic  Resul ts 


E[tp]  ■  1200  ns. 
ta  *450  ns.  tc»500  ns. 

Number  of  Pc’s  Number  of  flp’s  Up  Access  Rate 

(Hi  1 1  ions/sec) 


1  1  0.G0G0S 

2  1  1.14157 

3  1  1.5G373 

4  *  2.39707 

8  8  4.7G009 


Simulat ion 
Resul  ts 


2.411 

4.751 


16 


16 


9.48G38 


9.476 
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CHAPTER  6 
CONCLUSIONS 

In  the  previous  chapters  several  analytic  models  have  been  presented. 
Chapter  2  was  devoted  to  systems  in  which  the  effective  processing  time  is  equal 
to  the  memory  rewrite  time.  An  important  result  observed  was  the  absence  of  a 
law  of  diminishing  returns.  The  performance  of  a  multiprocessor  system  with  n 
processors  and  n  memories  continues  to  rise  at  a  constant  rate  as  n  increases.  A 
simple  exponential  server  model  showed  this  rate  to  be  0.5;  a  constant 
processing  time  model  predicted  a  slope  of  0.586  for  the  average  number  of  busy 
Mp  s.  The  exponential  server  model  gives  the  average  number  of  busy  Mp’s  as 
ivUm/(n*m-l).  An  approximate  result  for  constant  processing  times  gives  the 
average  number  of  busy  Mp’s  as  i*/ /,  where  i-max(n.m)  and  j-min(n,m). 
An  intuitively  obvious  conclusion  limits  the  maximum  number  of  active  Pc’s  by 
min(n,m).  This  maximum  is  reached  if  each  processor  accesses  only  v,re  memory  all 
the  time.  The  model  of  section  2.6  analyzed  the  effect  of  skewing  the 
processors’  access  patterns.  Thus,  the  maximum  Mp/)R  is  min(m,n)/tc. 

Chapter  3  contained  several  models  for  multiprocessor  systems  with  tp>tw.  A 
new  model  for  geometrical  y  distributed  processing  time  was  developed.  A 
different  analytic  approach  was  used  to  model  systems  with  private  caches  for 
the  processors.  In  general,  since  the  Pc  is  slow,  it  takes  fewer  memory  units 
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for  the  performance  to  exhibit  a  saturat.cn  effect.  This  can  be  discerned  by 
comparing  Figs.  2.11  and  3.3a.  In  the  absence  of  memory  contention  (which  can 
now  be  possible  even  for  m<n)  the  maximum  MpAR  is  n/(ta*tp). 

Chapter  4  discusses  multiprocessor  systems  in  which  tp<tw.  Since  the 
processor  is  fast,  performance  improvement  is  obtained  for  m>n.  If  m-n,  these 
systems  do  not  yield  significant  improvement  over  systems  with  tp=tw.  In 
general,  adding  an  extra  memory  improves  the  performance  more  than  adding  an 
extra  processor.  The  maximum  average  Mp/]R  is  the  minimum  of  m/tc  and  n/(ta+tp>, 
the  maximum  is  achieved  if  the  processors  do  not  interfere.  Note  that  since  the 
Pc  is  'ery  fast,  it  can  make  a  request  to  the  memory  that  served  it  »ast  before 
the  rewrite  cycle  is  over.  In  this  case,  he  Pc  has  to  wait  even  though  no  other 
Pc  is  being  serviced  by  the  memory  module. 

Chapter  5  presented  some  empirical  measurements  of  PDP-11  programs;  C.mmp 
uses  POP-ll’s  as  the  Pc’s.  It  also  illustrated  how  unit  instructions  can  be 
extracted  from  the  machine  instructions.  The  measurement  and  the 
characterization  of  the  processor’s  instruction  timing  was  used  to  obtain  the 
distribution  of  the  effective  instruction  processing  time,  which  was  then  used 
to  evaluate  analytically  estimates  of  the  unit  execution  rate  of  C.mmp.  Contrary 
to  common  intuition,  the  unit  instruction  concept  yields  a  larger  effective 
processing  time  for  register-register  operations,  since  the  processing  time  is  a 
measure  of  the  interval  between  two  memory  accesses.  The  average  processing  time 
was  found  to  be  1150  ns.  for  the  PDP-11/20.  The  average  processing  times  for  the 


I 
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PDP-11/40  and  11/45  are  approximately  625  and  400  nanoseconds.  A  PDP-11/20 
instruction  can  require  from  1  to  7  accesses  to  men-jry,  but  the  average 
PDP-11/20  instruction  makes  approximately  2  accesses  to  memory;  it  comprises  two 
unit  instructions. 

6.1  APPLICATIONS 

The  models  developed  in  this  thesis  should  give  computer  system  designers 
considerable  insight  into  some  of  the  design  issues.  If  tp<tw,  the  performance 
(as  indicated  by  the  Mp/iK)  saturates  for  some  value  of  m  that  is  greater  than  n. 
If  tp>tw,  however,  it  may  saturate  in  some  cases  for  m<n.  Most  system  designers 
tend  to  use  optimality  of  cost  /performance  as  an  over-riding  factor.  Another 
important  issue  is  the  extendability  or  the  effect  of  changing  the  system 
parameters[BhatS 72].  Performance  requirements  often  change  and  the  initial 
design  should  be  modifiable  to  meet  the  new  performance  desired.  A  good 
extendable  design  should  therefore  have  a  certain  amount  of  unutilized  capacity. 
The  system  should  be  designed  to  have  a  performance  slightly  greater  than  that 
which  maximizes  performance/cost. 

In  a  multiprocessor  system,  the  alternatives  for  change  include  increasing 
the  number  of  Pc’s  or  Mp’s,  as  well  as  replacing  the  Mp’s  or  Pc’s  with  faster 
versions.  The  adding  of  Pc’s  and  Mp’s  is  a  less  viable  alternative  due  to  the 
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extent  ot  the  engineering  effort  involved.  In  some  cases,  the  entire  central 
crosspomt  switch  may  have  to  be  either  redesigned  or  rebuilt.  However, 
replacement  with  faster  upward  compatible  processors  or  faster  memories  is 
easier.  It  is  an  undisputed  fact  that  increasing  the  speed  of  the  processor  or 
memory  by  a  factor  of  k  will  not  enhance  the  performance  by  the  same  factor.  The 
analytic  models  suggest  that  increasing  the  processor  speed  causes  the 
saturation  point  to  shift  to  a  larger  m.  If  tp<  tw  a  well-designed  system  is 
likely  to  have  m>n.  Since  the  memory  is  the  limiting  factor,  the  best  choice  for 
enhancing  the  performance  is  the  use  of  faster  memories.  However,  if  the 
original  design  used  more  memory  modules  than  needed  to  saturate  the 
performance,  faster  processors  will  also  yield  some  improvement.  If  tp>tw 
substitution  of  faster  processors  effects  an  increase  in  the  access  rate.  To 
allow  for  future  substitution  of  faster  processors  it  is  desirable  to  choose 
m>n.  Thus,  a  rule  of  thumb  indicates  that  a  good  design  choice  is  m>n,  with  a 
not  too  large  a  mismatch  in  Pc  and  Mp  speeds.  As  described  in  chapter  5,  C.mmp 
is  a  16x16  multiprocessor  system  with  tp= 1 200  ns,  ta=450  ns  and  tw=50  ns.  The 
M  pAR  expected  is  9.476  million/sec.  If  the  PDP-11/20  processor  is  replaced  by  a 
faster  Pc  (11/45)  with  a  typical  processing  time  of  450  ns  the  MpAR  increases  to 
15.6  million/ser.  However,  since  the  system  is  very  much  processor  speed 
limited,  a  150  ns  cache  for  each  Pc  increases  the  MpAR  only  by  147,  and  207  for 
hit  ratios  of  0.5  and  0.7  respectively. 
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6.1.1  An  Illustrative  Example 

In  ordi  r  to  illustrate  the  use  of  the  models  developed  in  this  thesis, 
consider  a  8x8  multiprocessor  system  with  tp*tw.  This  configuration  is  still 
memory  limited,  as  seen  from  Fig.  2.10.  Let  the  memory  cycle  time  be  950  ns; 
access  time=400  ns.  Assume  a  switch  delay  of  200  ns  and  a  50  ns  delay  in  the 
relocation  hardware.  The  processor  has  a  typical  processing  time  of  450  ns. 
Hence,  the  effective  processing  time  is  500  ns;  effective  access  time  is  600  ns; 
and  the  effective  cycle  time  is  1100  ns  (assuming  that  50  ns  of  the  switch  delay 
is  overlapped  with  the  rewrite  cycle). 

The  discrete  Markov  chain  model  of  Chapter  2  gives  the  average  number  of 
busy  Mp  s  to  be  4.9471;  the  MpAR  is  4.4974  million/sec.  Let  us  evaluate  the 
effect  of  replacing  the  original  Pc  with  either  a  300  ns  or  a  150  ns  processor. 
Note  that  the  effective  processing  time  is  350  ns  and  200  ns.  The  corresponding 
percentage  increase  in  the  MpAR,  as  computed  by  the  models  of  chapter  4,  is 
4.337  and  9.057.  This  is  not  surprising  since  the  Original  system  was  memory 
speed  limited. 

Instead  of  changing  the  Pc’s,  the  Mp’s  could  be  -.hanged.  If  the  new  Mp  has 
ta*=250  ns  and  tw«=300  ns,  the  effective  ta  and  tw  are  450  ns  and  250  ns 
respectively.  The  new  cycle  time  is  700  ns.  Since  tp>tw  the  models  of  chapter  3 
can  be  used.  The  new  Mp  access  rate  is  44.397  higher. 

A  third  alternative  is  the  use  of  a  150ns  access  time  cache.  The  value  of 
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c«r.,  the  probability  of  finding  the  data  in  the  cache,  depends  on  the  size  of  the 
cache  Consider  three  sizes  that  result  in  oc*0.5,  ez«0.7  and  oc-0.8.  The  cache 
model  of  chapter  3  predicts  performance  enhancement  Of  677,  1077,  and  1307. 

The  above  performance  data  coupled  with  cost  information  can  be  used  to 
select  a  profitable  parameter  change.  When  the  number  of  design  alternatives  is 
large,  simple  analytic  models  help  to  determine  a  judicious  choice. 


6.2  PROPOSALS  FOR  FUTURE  WORK 


This  thesis  is  not  a  panacea  for  multiprocessor  system  designers.  An 
attempt  has  been  made  to  develop  some  simple  basic  tools.  Anyone  working  with 
systems  is  aware  of  their  high  degree  of  complexity  and  is  likely  to  be  shocked 
when  he  sees  the  simplicity  of  the  models  suggested.  He  may  react  negatively 
when  he  notices  how  much  of  the  real  system  has  been  left  out  and  how 
restrictive  the  assumptions  are  that  have  been  made  in  the  analysis.  This 
skepticism  is  not  entirely  justified;  simple  analytic  models  often  exhibit 
overall  behavior  similar  to  the  complex  system  modeled. 

One  of  the  main  assumptions  made  in  this  thesis  is  the  independence  of 
successive  memory  requests.  In  most  real  systems,  due  to  program  locality,  there 
is  some  serial  correlation  between  requests  made  by  a  Pc.  Thus,  if  a  Pc  accesses 
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6.2  Proposals  for  Future  Research 

Mp[j]  it  continuer  to  do  so  for  some  amount  of  time.  A  modeling  technique 
suggested  is  dividing  the  system  activities  into  various  phases.  The  number  of 
Mp’s  accessed  in  each  phase  is  different.  Jackson’s  general  exponential  server 
model[JackJ63]  indicates  that  the  stationary  state  probabilities  depend  only  on 
the  average  frequency  of  visits  to  the  various  servers.  Therefore  the  effect  of 
the  serial  correlation  of  requests  is  not  seen.  However,  for  real  systems  which 
are  not  exponential,  some  degradation  may  be  observed.  We  also  assume  pu-l/m. 
As  indicated  in  section  2.6,  this  is  not  the  most  desirable  access  pattern.  The 
effect  of  skewing  the  access  patterns  is  to  me  '.e  the  MpAR.  Given  the  above 
two  opposing  effects,  pjj**l/m  serves  as  a  good  larameter  value  for  comparison  at 
a  high  level. 

Since  no  a  priori  empirical  evidence  is  available,  a  large  portion  of 
future  research  activity  should  involve  the  measurement  of  real  systems.  This 
will  bring  to  light  the  seriousness  of  the  assumptions  and  establish  a  proper 
framework  and  area  of  applicability  of  analytic  models.  Measurement  of  dynamic 
program  behavior  should  be  stressed.  It  should  be  remembered  that  the  output  of 
the  models  can  only  be  as  good  as  the  input.  System  analysts  should  not  neglect 
the  parameter  estimation  phase  of  performance  preo'ction. 

Future  analytic  studies  should  attempt  to  differentiate  between  instruction 
and  data  references.  At  a  higher  level,  memory  interference  models  can  be  used 
as  a  part  of  an  overall  hierarchical  model  of  the  computer  system  at  the  program 
level.  Empirical  studies  should  be  conducted  in  order  to  obtain  good  concise 
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characterization  of  component  and  subsytem  behavior.  Such  information  can  be 
used  to  drive  more  efficient  simulations  of  complex  systems. 
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C 

C 

C 

C 

C 


1973 

1974 

1975 

C 


20 


21 

30 

22 

31 
C 
C 

2000 


17 

C 
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LISTING  OF  PROGRAM  FOR  THE  ALGORITHM  SHOWN  IN  FIG.  2.5 


DISCRETE  MARKOV  CHAIN  TP-TU  ~  STATE (NO.  OF  STATES,  NO.  OF  PC) 
I NTEGER  STACK (IB, IB) , STATE (25, IB) , A (IB)  .FIRST (1G) 

1, 1  DONE  (25) ,  PTR  (1G) ,  DONE  (1G) ,  NLJAYS  (1G)  ,B0UND  (17) 

D I  MENS I  ON  TRANS (25 , 25) , B (25) ,  Z  (25) 

COMMON  TRANS, B.Z.NPARTS 
INTEGER  SUM, T,X,Q 
TYPE  1973 


FORMAT (IX, ’NUMBER  OF  PROCESSORS’,/) 
ACCEPT  1974, NPC 

FORMAT  (I) 

TYPE  1975 


FORMAT (IX, ’NUMBER  OF  MEMORIES’,/) 
ACCEPT  1974, NMP 
M I NPM-M I N0 (NPC , NMP ) 


M-l 

PTR (NPC-l)-l 
GO  TO  31 
CONTINUE 

IF (NPC.GT.M)PTR (NPC-M) -NPARTS+1 
DO  21  I-2.M 
A  (I ) =1 
SUM-0 

DO  22  I-2.M 
SUM-SUM+A ( I } 

A (1) -NPC-SUM 

WRITE (15, 2000) , (A(I ) , I»1,M) 

FORMAT (IX, 16(13, IX) ) 
NPARTS-NPARTS+1 
DO  17  K-l.M 
STATE (NPARTS,K)=A(K) 


T=2 

B0  CONTINUE 

IF (T.GT.M)GO  TO  120 
X-A(1)-A(T) 

IF (X.GT. l)GO  TO  100 
T-T+l 
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Listing  of  Program  for  the  Algorithm  shown  in  Fig.  2J 


GO  TO  60 

100  CONTINUE 
ITMP-A(T) 

DO  101  1-2,  T 

101  A ( I ) - I TMP+1 
GO  TO  30 

120  M-M+l 

BOUND (M)-NPARTS 
C  WRITE  (15,2001) ,NP ARTS 

2001  FORMAT  (/,  IX,  %'nWnWrfnWnV  ’,13,’  ***********  ,  / ) 

IF (M.LE.MINPM)GO  TO  20 
WRITE (16, 111), NPARTS 

111  FORMAT!/, IX,’ NUMBER  OF  PARTITIONS-* , I) 

WRITE (15, 3113) 

3113  FORMAT (///) 

C 

C 

C 

C  INITIALIZE  THE  STACK,  NWAYS,  FIRST 

C 

DO  2100  I-l.NPC-l 
STACK (I , 1 ) -I 
NWAYS  (I )  -1 

2100  FIRST (I) -2 

FIRST (NPC-l)-l 
C 

C  BEGIN  WALK  THROUGH  TREE 

C 

L-NPC 

1  CONTINUE 

IF  (DONE  (L)  .EQ.DGO  TO  25 

C 

200  K-L-l 

J-FIRST (K) 

C 

DO  10  I-J.NMP 

IF (STACK (K, I ) ,NE. STACK (K,J) ) GO  TO  15 
10  CONTINUE 

C 

DONE  (L) — 1 
FIRST (K) -1 
I-I+l 
C 

15  NWAYS  (L) -I -J 

C 

DO  300  KK-1,NMP 
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300  STACK (L.KK) -STACK (K,KK) 

STACK (L.J) -STACK (K,J)+1 
C 

IF (DONE(L) .NE. 1) FIRST (K)-I 
C 

IF (L.EQ.NPC)GO  TO  1000 
C 
C 

C  SET  PTR  TO  ORIGINAL  STATE  AT  LEVEL  L 

C 

I F (°TR (L) . NE.  0)  I  DONE (PTR  (L) )  -1 

K-NFC-L 

Kl«BOUND(K)+l 

K2-B0UND (K+l ) 

C 

DO  35  JJ-K.l.-l 
C 

DO  32  KK-K1.K2 

I F  (STATE  (KK  ,JJ).EQ.  STACK  (L,JJ)+1)  GO  TO  34 
32  CONTINUE 

C 

KK-0 

GO  TO  3G 

34  Kl-KK 
C 

35  CONTINUE 
C 

3G  PTR (L) -KK 

C 
C 
C 

L-L+l 
GO  TO  200 
C 
C 

25  DONE (L ) *0 

L-L-l 

IF (L. EQ. 1 ) GO  TO  7777 
C 

GO  TO  1 
C 
C 

C  FIND  TERMINAL  STATE 

C 

1000 
C 


CONTINUE 


1 
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Listing  of  Program  for  the  Algorithm  shown  in  Fig.  2.S 
00  1100  K-l.NMP 

IF (STACK (NPC, K) .EQ.01GO  TO  1150 


1100 

CONTINUE 

K-NMP+1 

C 

1150 

K-K-l 

K1-B0UN0(K)+1 


K2-B0UN0 (K+l ) 

C 

00  135  JJ-K, 1 , -1 
C 

00  132  KK-K1 ,K2 

I F (STATE (KK, JJ) .EQ. STACK (NPC, JJ))  GO  TO  134 
132  CONTINUE 

C 

TYPE  987G 

9875  FORMAT (IX, ’CANNOT  FINO  TERMINAL  STATE  !!!!!!’) 

C 

134  Kl-KK 
C 

135  CONTINUE 
C 

C  UPOATE  TRANSITION  MATRIX 

C 

TEMP-1 

DO  150  I -NPC-1 ,1,-1 
II-PTRU) 

TEMP-TEMPftNUAYS ( I +1 ) 

IF (I  I .EQ.01GO  TO  150 

I F  ( I  DONE  ( 1 1 ) .  NE .  1 )  TRANS  (KK ,  1 1 )  -TRANS  (KK ,  1 1 )  +TEMP 
150  CONTINUE 

C 

I F  (NPC.  LE .  NMP)  TRANS  (KK ,  NPARTS)  -TRANS  (KK,  NPARTS)  +TEMP*NMP 
C 

GO  TO  1 
C 
C 

C  TRANSITION  MATRIX  HAS  BEEN  GENERATED 

C 

7777  CONTINUE 

C  00  5544  I -1, NPARTS 

CS544  URI TE  (15,4455) ,  (TRANS (I ,  J) ,  J-l ,  NPARTS) 

C445S  FORMAT (IX, 25 (F8. 4,1X1) 

C 

00  7250  I-l.MINPM 
Y-NMPvwvI 
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DO  7250  J-BOUNO  ( I )  +1 .  BOUND  ( I  +1 ) 

DO  7100  K-l, NPARTS 

7100  TRANS  (K,  J)  >»TRANS  (K ,  J)  / Y 

7250  CONTINUE 

C  DO  5544  I -1, NPARTS 

C5544  WRITE (15,4455) ,  ( TRANS (1 ,  J) ,J«1, NPARTS) 

4455  FORMAT (IX, 25 (F8.G, IX) ) 

C 

DO  8000  I -1 , NPARTS 
TRANS (I, I) -TRANS (I, I) -1.0 
8000  TRANS (NPARTS, I) -1.0 

C 

B (NPARTS) -1 
C 

CALL  GAUSS 
C 

DO  8500  I-l.MINPM 
C 

TMP-0 

DO  8250  J -BOUND  ( I )  +1 ,  BOUND  ( I  +1 ) 

8250  TMP-TMP+ZU) 

C 

8500  OCC-OCC+TKPal 

C 

WRITE (15. 8750), OCC 

8750  FORMAT  (//,  IX,  ’EXPECTED  VALUE  OF  NO.  OF  BUSY  HP-  ’ 

1.F10.B) 

C 

STOP 

END 

C 

C 

SUBROUTINE  GAUSS 

C  THIS  SUBROUTINE  SOLVES  SIMULTANEOUS  LINEAR  EQUATIONS 

C  AvvX-B 

C 

DIMENSION  A (25,25) ,B(25) , X (25) 

COMMON  A.B.X.N 
INTEGER  S 
C 

S=N 

5  CONTINUE 

IF(S-l)  50,50,10 
10  CONTINUE 

IF (S. GT. 2) GO  TO  105 

D-A (1 , 1 ) *A (2,2) -A (1 , 2) *A (2,1) 
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IF(ABS(D).GT.0.0005)GO  TO  105 
TYPE  25 
GO  TO  100 

105  DO  20  I-l.S 

M-S-I+l 

IF(ABS(A(f1,S) )  .GT. 0.0005)  GO  TO  30 
20  CONTINUE 

TYPE  25 

25  FORMAT (IX, ’THE  COEFFICIENT  MATRIX  IS  SINGULAR’,/) 

GO  TO  100 
30  CONTINUE 

IF (M.EQ.S)GO  TO  40 
T-B(S) 

B  (S )  -B  (M) 

B(M)-T 
DO  35  J=1,S 
T-A(S.J) 

A  (S,  J)  -A  (M,  J) 

AIM, J)-T 
35  CONTINUE 

40  CONTINUE 

00  45  I-l.S-l 
K-S-I 

I F (ABS (A (S, S) ) .GT. 0.0005) GO  TO  42 
GO  TO  100 

42  B (K) =B(K) -B(S)*A (K,S) /A (S,S) 

DO  45  J-l.S-l 

A  (K,  J)  -A (K,  J)  -A (K, S) *A (S,  J)  /A (S, S) 

45  CONTINUE 

S-S-l 
GO  TO  5 
50  CONTINUE 

00  70  1=1, N 
SUM-B(I) 

DO  G0  J-1,1-1 
SUM-SUM-A ( I , J) *X ( J) 

G0  CONTINUE 

IF (ABS (A (I , I) ) .GT. 0.0005) GO  TO  61 
GO  TO  100 

61  X ( I ) -SUn/A (1,1) 

70  CONTINUE 

100  CONTINUE 

RETURN 
END 
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C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 


1973 

1974 

1975 

1949 


C 


C 

C 

C 

C 

421 

4421 

C 


C 

C 
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****VoVVoV»,f*Vf*VfVoVVn,f**V«V***************************Vn,n,n'nWnV**** 


THIS  PROGRAM  SOLVES  THE  DISCRETE  MARKOV  CHAIN  FOR  TP-TU 
USING  SINGLE  LEVEL  TRANSITIONS 
LONG  VERSION 


Vf>V»V>V>V>VV’fV’nVV'oV*>V>V>V>V>V>V>V>V>,o,o,oV>V>V>V>V>VVoVV«VVV>VVf>VfnV>VVfVVVf>VV,o,oVtfiiV>V»ViV>V>V>V>V»V>V»V 

INTEGER  STATE (18G, 1G) , BOUND (17) ,B1 (17)  ,B2 (17) 

INTEGER  SI (116, 16) , S2 (146, 16) 

DIMENSION  TRANS (186, 186) ,B (186) ,Z (186)  ,X12(146, 116)  ,X23  (186,146) 
DIMENSION  C (186) 

COMMON  NP ARTS, TRANS, B,Z 
INTEGER  T,Q 
INTEGER  A (16 ) , BD (17) 

TYPE  1973 


FORMAT (IX, ’NUMBER  OF  PROCESSORS’,/) 
ACCEPT  1974, NPC 

FORMAT  (I) 

TYPE  1975 


FORMAT (IX, ’NUMBER  OF  MEMORIES’,/) 

ACCEPT  1974, NMP 
URITE(15, 1949) , NPC, NMP 

FORMAT (IX, ’NUMBER  OF  PROCESSORS  -’,12,/, 
11X, ’NUMBER  OF  MEMORIES  -’,12,/) 

MINPM-MIN0 (NPC, NMP) 

LEVEL-NPC 


IPRT-1 
GO  TO  4000 


PARTITION  NPC  INTO  NMP  PARTS 
FINAL  STATE  VECTOR  —  STATE 

NPARTS-NNPART 
DO  4421  I =1 ,M 

BOUND ( I ) =BD ( I ) 


IPRT-2 
LEVEL-NPC-1 
GO  TO  4000 

PARTITION  NPC-1 
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C  PARTIAL  ST,  TE  VECTOR  —  S2 

C 

422  N2-NNPART 

DO  4422  I-1,M 

4422  B2UNBDU) 

C 

C  GENERATE  NUMBER  OF  WAYS  OF  GOING  FROM  S2  TO  STATE 

C 

LSRC-MIN0 (NPC-1 , NMP) 

DO  1  I-l.NPARTS 
DO  1  J-1.N2 
1  X23(I,J)-0 

DO  40  K-l.LSRC 

I1-B20O+1 

I2«B2(K+1) 

DO  40  L-11,12 
K1 -BOUND  (10+1 
K2-BOUND(K+2) 

IF (MT NPM.LT.K+1 ) K2 -BOUND (K+l ) 

CD  32  KK-K1.K2 
M=0 

JJ-MIN0 (MINPM.K+l ) 

DO  20  1=1, JJ 

IF (S2 (L, I ) .EQ. STATE (KK, I ) ) GO  TO  20 
IF (S2(L, I) .NE. STATE (KK, I)-l)GO  TO  32 
M-M+l 
1 1 -I 

20  CONTINUE 

IF (M. NE. 1 )GO  TO  32 
DO  25  I  - 1 1 , NMP 

I F  (S2  (L , I ) . NE . S2 (L , 1 1 ) ) GO  TO  27 
25  CONTINUE 

1  =  1+1 

27  X23  (KK,L) =1 -I  I 

32  CONTINUE 

A0  CONTINUE 

C 

C  UPDATE  TRANSITION  MATRIX  —  TRANS 

C 

K-l 

K1 -BOUND (KI+l 
K 2-BOUND (K+l) 

C 

DO  100  L-1.N2 
C 

DO  35  JJ-K.1,-1 
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C 

DO  5832  KK-K1.K2 

I F  (STATE  (KK ,  JJ) . EQ.  S2  (L ,  JJ)  +1 ) GO  TO  34 
5632  CONTINUE 

C 

KK«0 
GO  TO  38 

34  Kl-KK 
C 

35  CONTINUE 
C 

36  CONTINUE 

IF (KK. EQ. 0) GO  TO  100 
DO  50  I-l.NPARTS 

50  TRANS ( I , KK ) -TRANS ( I , KK ) +X23 (I ,L) 

100  CONTINUE 

C 

'  C  DECREMENT  LEVEL 

C 

LEVEL-NPC-2 

200  CONTINUE 

IF (LEVEL. EQ.0)GO  TO  300 
C 

C  PARTITION  LEVEL 

C  PARTIAL  STATE  VECTOR  —  SI 

C 

IPRT=3 
GO  TO  4000 

423  Nl-NNPART 

DO  4423  M,M 

4423  BKI)-BD(I) 

C 

C  COMPUTE  NUAYS  FOR  GOING  FROM  SI  TO  S2 

C 

LSRC-MN0 (LEVEL, NMP) 

C 

DO  341  I-1.N2 
DO  341  J-l.Nl 

341  X12(I, J)-0 

DO  5740  K-l.LSRC 
I 1=B1 (K)+l 
I2-BKK+1) 

DO  5740  L-11,12 
K1-B2(K)+1 
K2=B2 (K+2) 

IF (MINPM.LT.K+1)K2=B2(K+1) 


i 
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00  5732  KK-K1.K2 
n-0 

jj-min0(minpm,k+d 

DO  577  I-l.JJ 

IF  (SI (L, I ) ,EQ.S2(KK, I ) )G0  TO  577 
IF  (SI (L, I ) .NE. S2(KK, I ) -1 )G0  TO  5732 
M-M+l 
1 1 -I 

577  CONTINUE 

IF (M.NE. 1)G0  TO  5732 
DO  5725  I-II.NMP 
IF  (SI (L, I ) .NE.S1 (L, 1 1 ) )G0  TO  979 
5725  CONTINUE 

I-I+l 

579  X12 (KK.L) -I -I  I 

5732  CONTINUE 

5740  CONTINUE 

C 

C  MATRIX  MULTIPLICATION  TO  GET  NUAYS  FROM  SI  TO  STATE 

C 

IC-1 

GO  TO  125 

1255  CONTINUE 

C 

C  UPDATE  TRANS 

C 

K-NPC-LEVEL 

IF  (K.GT.MINPM)GO  TO  5800 
K1 -BOUND  (10+1 
K2-B0UND(K+1) 

C 

DO  5800  L-l.Nl 
C 

00  5835  JJ-K, 1 , -1 
C 

00  5832  KK-K1 ,K2 

IF  (STATE (KK, JJ) .EQ.S1 (L, JJ) +1 )G0  TO  5834 
5832  CONTINUE 

C 

KK-0 

GO  TO  583S 


5834 

Kl-KK 

C 

5835 

CONTINUE 

C 

5835 

CONTINUE 
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IF (KK.EQ.0)GO  TO  5800 
DO  5850  I-l.NPARTS 

5850  TRANS ( I , KK) -TRANS ( I , KK ) +X23 (I ,L) 

5800  CONTINUE 

C 

C  DECREMENT  LEVEL 

C 

LEVEL -LEVEL-1 
IF (LEVEL. EQ.0IGO  TO  300 
C 

C  PARTITION  LEVEL 

C  PARTIAL  STATE  VECTOR  —  S2 

C 

IPRT-4 
GO  TO  4000 

424  N2-NNPART 

DO  4424  I -1,M 

4424  B2(I)-BD(I) 

C 

C  COMPUTE  NUAYS  FROM  S2  TO  SI 

C 

LSRC-M I N0 (LEVEL , NMP ) 

C 

DO  591  I-l.Nl 
DO  591  J-1.N2 

591  X12(I ,  J)-0 

DO  5940  K-l.LSRC 

1 1- B2  (K)+l 

12- B2IK+1) 

DO  5940  L-11,12 
Kl-Bl (K)+l 
K2=B1 (K+2) 

IFIMINPM.LT.  K+DK2-BHK+1) 

DO  5932  KK-K1.K2 
M-0 

JJ-MIN0 (MINPM.K+l) 

DO  5920  I-l.JJ 

IF (S2(L, I ) .EQ.S1 (KK, I ) ) GO  TO  5920 
IF (S2(L, I ) .NE.S1 (KK, I ) —1 ) GO  TO  5932 
M-M+l 
)  I-I 

5920  CONTINUE 

IF (M.NE. l)GO  TO  5932 
DO  5925  I-I I, NMP 

I F (S2 (L , I ) . NE . S2 (L , 1 1 ) ) GO  TO  5927 
5925  CONTINUE 
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1=1+1 

5927  X12.(KK,L)-MI 

5932  CONTINUE 

5940  CONTINUE 

C 

C  MATRIX  MULTIPLICATION  GIVES  NUAYS  FROM  S2  TO  STATE 

C 

I  C=2 

GO  TO  125 

1260  CONTINUE 

C 

C  UPDATE  TRANS 

C 

K-NPC-LEVEL 

IF (K.GT.MINPM)GO  TO  6300 
K1 -BOUND (K)+l 
K2-B0UND (K+l ) 

C 

DO  6300  L-1.N2 
C 

DO  6335  JJ=K, 1 , -1 
C 

DO  6332  KK-K1 ,«2 

IF  (STATE (KK, JJ) . EQ.S2 (L, JJ) +1 ) GO  TO  6334 
5332  CONTINUE 

C 

KK-0 

GO  TO  6336 

5334  Kl-KK 

C 

6335  CONTINUE 
C 

6336  CONTINUE 
IF (KK.EQ.0)GO  TO  6300 
DO  6350  I =1 , NPARTS 

5350  TRANS ( I , KK ) -TRANS ( I , KK ) +X23  (I ,L) 

6300  CONTINUE 

C 

C  DECREMENT  LEVEL 

C 

LEVEL-LEVEL-1 
GO  TO  200 

306  CONTINUE 

I F (NPC  GT.NMP1G0  TO  400 
DO  350  1=1, NPARTS 

350  TRANS  ( I , NPARTS)  -TRANS  ( I ,  NPARTS )  +1 . 0*NMP*X23  (1,1) 
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400  CONTINUE 

C 

C 

C  TRANSITION  MATRIX  HAS  BEEN  GENERATED 

C 

7777  CONTINUE 

C  DO  5544  I-l.NPARTS 

C5S44  WRITE (15,4455) ,  (TRANS ( I , J) , J»1 ,NPARTS) 

C4455  FORMAT (1X,25(F8.4, IX) ) 

C 

DO  7250  I-l.MINPM 
Y-(1.0*NMP)**I 

DO  7250  J=BOUND ( I ) +1 , BOUND ( I +1 ) 

DO  7100  K-l.NPARTS 

7100  TRANS (K,  J) *TRANS (K, J) /Y 

7250  CONTINUE 

C  DO  5544  I-l.NPARTS 

C5544  WRITE (15, 4455),  (TRANSd  ,J)  .J-l.NPARTS) 

C4455  FORMAT (IX, 25 (F8. 6, IX)) 

C 

DO  8000  Ul.NPARTS 
TRANSd,  I )  -TRANS  (I ,  I )  -1.0 
8000  TRANS (NPARTS,  I ) -1 . 0 

C 

B (NPARTS) =1 
C 

CALL  GAUSS 
C 

DO  8500  Ul.MINPM 
C 

TMP-0 

DO  8250  J-BOUNO  ( I ) +1 , BOUND ( I +1 ) 

8250  TMP=TMP+Z (J) 

WRITE (15, 9321), I, TMP 

9321  FORMAT <1X,’PR0B (NO.  OF  BUSY  MEMS-’ ,  12,  ’  )-’,lX,F10.8) 

C 

8500  OCC«OCC+TMP*I 

C 

WRITE  (15, 8750), OCC 

8750  FORMAT  (//, IX,  ’ EXPECTED  VALUE  OF  NO.  OF  BUSY  HP-  ’ 

1.F10.B) 

C 

STOP 

C 

C  MATRIX  MULTIPLICATION  IN-LINE  SUBROUTINE 

C 
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125  CONTINUE 
N-NPARTS 

NM-N2 

fl-Nl 

IF(1C.EQ.1)G0  TO  126 

NM-N1 

M-N2 

126  CONTINUE 
C 

DO  511  I -l.N 
DO  5111  J-1,M 
C(J)-0 

DO  5111  K-l,Nf1 

51H  C  (J)  =C(J)+X23(I ,K)*X12(K, J) 

DO  511  J«l,n 

511  X23CI .  J)-C(J) 

IF (IC.EQ. 1)G0  TO  1255 
GO  TO  1260 
STOP 
C 
C 

C  IN-LINE  SUBROUTINE  FOR  GENERATING  PARTITIONS 

C  STATE  VECTORS 

C 
C 

4000  ’  MI  NL-m  N0  (LEVEL ,  NnP) 

C 

NNPART-0 

SUn-0 

M-l 

GO  TO  31 

220  CONTINUE 

DO  21  I-2.M 

21  A(I)«1 

30  sun-0 

DO  22  I-2,n 

22  sun-sun+Ad) 

31  A(l) -LEVEL -sun 
C 

C  URITE (15,2000) , (A(I),I»1,M) 

2000  FO-WAT  (IX.  16(13.  IX) ) 

NNPART-NNPART+1 
GO  TO  (171,172,173,172), IPRT 
171  CONTINUE 

DO  1711  K-l,n 

1711  STATE (NNPART,K)-A(K) 


- ^ 
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IF (NMP.LE.M)GO  TO  1700 
DO  181  K-M+l.NMP 

181  STATE (NNPART.K) -0 
GO  TO  1700 

172  CONTINUE 
DO  1722  K-l.M 

1722  S2(NNPART,K).AOO 

IF (Nf1P.LE.fi) GO  TO  1700 
DO  182  K-M+l.NMP 

182  S2(NNPART,K)«0 
GO  TO  1700 

173  CONTINUE 
DO  1733  K-l.M 

1733  SI (NNPART.K) -A (K) 

IF  (NI1P.LE.fi) GO  TO  1700 
DO  183  K«M+1,NMP 

183  Sl(NN7ART,K)-0 

1700  T-2 

G0  CONTINUE 

IF (T.GT.M)GO  TO  1220 
I IX-A  (1) -A (T) 

IF (I IX.GT. l)GO  TO  1000 

T-T+l 

GO  TO  60 

1000  CONTINUE 

I  TMP-A (T) 

DO  101  1-2, T 

101  A  (I ) -I  TMP+1 

GO  TO  30 

1220  M-M+l 

BO (M) -NNPART 

C  WRITE (15, 2201), NNPART 

^®1  FORMAT  (/,  IX,  ’  *********  ’ ,  13,  * ********** * ,  /) 

I F  ( M . LE . M I NL ) GO  TO  220 
GO  TO  (421, 422, 423, 424),  IPRT 
C 
C 

END 

C 

SUBROUTINE  GAUSS 

C  THIS  SUBROUTINE  SOLVES  SIMULTANEOUS  LINEAR  EQUATIONS 

C  A*X=B 

C 

DIMENSION  A (186, 186)  ,B (186) , X (186) 

COMMON  N.A.B.X 
INTEGER  S 
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C 

5 

10 


105 

20 

25 

30 


35 

4" 


42 

45 

50 


60 


S=N 

CONTINUE 
IF(S-l)  50,50,10 
CONTINUE 

IF (S. GT. 2)G0  TO  105 

• 1  }  *A  <2»2)-A(l,2)*A(2,l) 

IF  (ABS(D)  .GT. 0,0005) GO  TO  105 
TYPE  25 
GO  TO  100 

DO  20  I-l.S 

n-s-i+i 

CONTINUE  (f1*S) }  *GT-0-0005)g°  TO  30 
TYPE  25 

COEFFICIENT  HATRIX  IS 

CONTINUE 

IF (N.EQ.S)GO  TO  40 
T=B (S) 

B  (S)  =B (M) 

B (M) -T 
DO  35  J“1 , S 
T-A (S, J) 

A  (S,  J)  -A  (M,  J) 

A  (M,  J) -T 
CONTINUE 
CONTINUE 
DO  45  M.S-l 
K-S-I 

ra“S*S,,-GT-0'0005,GO  T0  42 

B  (K)  «B  (K)  -B(S)VfA(K,S)/A(S,S) 

DO  45  J*=1 ,  S-l 

CONTINUE IK' J)  ~A  (K' SI,,A  IS-JI /A  ,s  •  SI 

S=S-1 

GO  TO  5 

CONTINUE 

DO  70  M,N 

sun-Bd) 

DO  60  J-l.i-i 
SUn-SUM-A ( I , J) *X  ( J) 

CONTINUE 

IF  (ABS  (A  (I ,  I ) ) .GT. 0,0005) GO  TO  61 
GO  TO  100 


SINGULAR’ 
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61  xu)-sun/A(i,n 

70  CONTINUE 

100  CONTINUE 

RETURN 
ENO 
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C 

C 

C 

C 

C 

C 

C 

C 


C 

C 

1973 

1974 

1975 

C 

C 

300 

C 

400 

C 


500 

C 


10 
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V»VVnVV»VVnVV»VVnVy»V*VnVVnVVnWnV***VnV********;nV**************VnV**>-nV** 


APPROXIMATE  MARKOV  CHAIN  MODEL  FOR  TP-TU 


^Vf^VVnVVnVVfVnVVnVVnVVoVVnVVnVVnWrftVnVVnVVnVVnV^Vn'nVVn'n'nWn'n'nVVf^Vo’oV****^**** 


DIMENSION  S  (17,17) , CM (16, 16) , TRANS (17, 17) 
INTEGER  COMB (16, 16) 

DIMENSION  Z (17) , B (17) 

INTEGER  D 

COMMON  TRANS, B.Z.MINPM 


TYPE  1973 

FORMA T( IX,’ NUMBER  OF  PROCESSORS’,/) 
ACCEPT  1974, N 

FORMAT (I) 

TYPE  1975 

FORMAT ( IX,’ NUMBER  OF  MEMORIES’,/) 
ACCEPT  1974, M 
XM-1 . 0*M 
MINPM-MIN0(N,M) 


DO  300  1-1,17 
S(I,l)-0 

S (1, I ) -0 

S(l,l)-1 

DO  400  1-1,16 
DO  400  J-1,1 

S  ( I  +1 ,  J+l )  -l ,  0>vJ>vS  ( I ,  J+l ) +S  ( I ,  J) 

DO  500  1-1,16 
CM ( I , 1 ) — I 
DO  500  J-2,1 

CM  ( I ,  J)  =CM  ( I ,  J-l )  Vfl ,  0>v  ( I  -  J+l ) 

COMB (1,1 )-l 
DO  20  K-2.1G 
COMB (K, 1 ) «K 
L-MIN0 (K, 16) 

DO  10  1=2, L 

COMB  (K,  I )  -COMB  (K, I-1)*(K-I+1) /I 
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20 

C 

C 

C 

C 

C 

C 

C 


C 

595 


G95 

700 

C 

1000 

C 


8000 

C 


CONTINUE 


DO  1000  I -1, MINPM 


xi-i.0*i/xn 

a 


00  700  J-0, I 

™E  NUMBER  OF  ACTIVE  PC'S  THAT  MAKE  A  REQUEST  TO  THE  I  MP’S 
XPROB  IS  THE  PROB  THAT  N-I+J  PC  MAKE  A  REQUEST  TO  THE  I  MP’S 


IF  (J,EQ.0)XPROB-(1.0-XI)**I 
XMULT  « (1 . 0-X I )  Mt  ( I  -J) 

IF  (I .  EQ.  JIXMULT-1.0 

I F  ( J.  NE .  e)  XPROB -1 . 0*COMB  ( I ,  J)  ,v  (X I  *>vJ)  *XMULT 
NN-N-I+J 

K1-MIN0 (NN, I ) 

1 1-1 

IF (K1 . EQ. 0) 11-0 
DO  700  L1-I1.K1 
TEMPI -1 

IF  (LI .  EQ. 0)GO  TO  S95 
Xl« (1. 0*I)**NN 

TEMPI  —CM (I , LI ) ,vS (NN+1 , Ll+1 ) /XI 
NBUSY-NO.  OF  BUSY  MP’S  DURING  NEXT  CYCLE 
K2-MIN0(I-J,r-I) 

12-1 

IF (K2.EQ. 0) 12-0 
DO  700  L2-I2.K2 
TEMP2-1 
NBUSY-L1+L2 
IF (L2.EQ. 0) GO  TO  695 
X2-  (1 . 0»v  (M- 1 ) )  ,v*  ( I  -J) 

T EMP2-CM  (M-I  ,L2)*S(I  -J+l , L2+1 )  /X2 

TRANS  (NBUSY,  I )  -TRANS  (NBUSY,  I )  +XPR0B*TEMP1*TEMP2 
CONTINUE 


CONTINUE 

DO  8000  I-l.MINPM 
TRANS (1,1)  -TRANS (I ,  I )-l,0 

TRANS (MINPM, I )-1.0 


B (MINPM) -1 


C 


CALL  GAUSS 
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C 

DO  2000  I-l.MINPM 
UER-UER+1 . 0>vZ  (I )  >v  I 
2000  CONTINUE 

C 

TYPE  l.UER 

1  FORMAT (IX, F10. 6) 

C 

STOP 

END 

C 

C 

SUBROUTINE  GAUSS 

C  THIS  SUBROUTINE  SOLVES  SIMULTANEOUS  LINEAR  EQUATIONS 

C  ArtX-B 

C 

01  MENS  I  ON  A  (17,17) » B  (1 7) ,  X  (1 7) 

COMMON  A.B.X.N 
INTEGER  S 
C 

S-N 

5  CONTINUE 

IF(S-l)  50,50,10 
10  CONTINUE 

IF (S.GT.2)G0  TO  105 

0-A  (1 , 1)*A(2,2) -A(1,2)*A(2,1) 

IF (ABS(O)  .GT, 0.0005) GO  TO  105 
TYPE  25 
GO  TO  100 

105  DO  20  1-1,  S 

M-S-I+l 

IF  (AB? (A (M,S) )  .GT, 0,0005) GO  TO  30 
20  CONTINUE 

TYPE  25 

25  FORMAT (IX, ’ THE  COEFFICIENT  MATRIX  IS  SINGULAR’,/) 

GO  TO  100 
30  CONTINUE 

IF (M.EQ.S)GO  TO  40 
T-BIS) 

B  (S)  -B  (M) 

B(M)«T 
DO  35  J-1,S 
T-A (S, J) 

A  (S,  J)  -A  (M,  J) 

AIM, J)-T 
35  CONTINUE 
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40  CONTINUE 

DO  45  I-l.S-l 
K-S-I 

IF (ABS (A  (S,S) )  .GT. 0.0035) GO  TO  42 
GO  TO  100 

42  B  (K)  -B  (K)  -B  (S)  >vA  (K,  S)  /A  (S,  S) 

DO  45  J-l.S-l 

A(K,  J)«A(K,  J)-A(K,S)>vA(S,  J)/A*S,S) 
45  CONTINUE 

S-S-l 
GO  TO  5 
50  CONTINUE 

DO  70  M,N 
SUM-B ( 1 ) 

DO  60  J-l, 1-1 
SUH»SUM-A ( I , J) *X ( J) 

G0  CONTINUE 

IF (ABS (A (I ,  I ))  .GT. 0. 0005)GO  TO  61 
GO  TO  100 

61  X (I ) -SUM/A  (1,1) 

70  CONTINUE 

100  CONTINUE 

RETURN 
END 
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C 

C 

C 

C 

C 

C 

C 

C 

C 

C 


1973 

1974 

1975 


C 

C 

C 


10 

11 

C 

C 

C 


50 

70 
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VnVVnVVf*VnVVnV^VVnV1V,V1VVnVVnVVnVVr;r/nVVr/nVVoVVnVVnVVnVVnVVnVVnVVnVV«VVnVVnVVnVVnVV«VVnV.VVn-n-nV 

^  F  " :  1AfE  MODEL  FOR  ARBITRARY  P(I,J),  tp-tu,  m>«n. 


PCI . J) -ALPHA  FOR  NJ 

-B  OTHERWISE 


DIMENSION  QPRQ6.16) 

DIMENSION  RATE (16) , Y(1G, 1G) 

DATA  ALPHA/0.0,  .05, .1, .15, .2, .25, .3,  .35,  .4,  .45,  .5,  .55,  .G, 
1.65, .7, .75, .8, .85,. 9, .95, 1.0/ 

TYPE  1973 


ACCEPT  1974.N™nAT,i,<''NlJr,BER  °F  PR0CESS0RS'.'> 
FORMAT (I) 

TYPE  1975 


Ar>or_  FORMAT  (IX,  ’NUMBER  OF  MEMORIES’ ,  /) 

ACCEPT  1974, NMP 
U I NPM - M I N0 ( NMP , NPC ) 

XM-1.  NMP 
XN-1 . 0*NPC 


COMPUTE  UER  FOR  VARIOUS  VALUES  OF  ALPHA 

DO  9999  IJK-1,21 
A=ALPHA(I JK) 

B- (1 . 0-A) / (XM-1) 

00  11  I-l.NPC 
00  10  J-l.NMP 
PROBd,  J)-B 
PROB ( I , I ) -A 

COMPUTE  QUEUEING  FREQUENCIES  BASED  ON  ACCESS  FREQS. 

00  70  1=1, NPC 

00  70  J-l.NMP 

Y(I, J) -1.0 

DO  50  L-l.NPC 

IF (L.EQ. I ) GO  TO  50 

Yd  ,  J)  «Y  (I ,  J ) * ( 1 . 0-PROB (L ,  J ) ) 

CONTINUE 

CONTINUE 
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85 

80 


85 


80 

43 

C 

C 

C 


35 
355 

36 
C 

9993 

C 

C 

C 


332 


200 

331 


C 

C 

C 


00  60  I-l.NPC 
00  60  J-l.NMP 
QPRU,  J)-0.0 
00  65  L-l.NPC 
I F (L. EQ. I )G0  TO  65 

OPR  ( I ,  J) -PROS  (L,  J) * (1-Y (L,  J) )  +QPR (I ,  J) 
CONTINUE 

QPR  (I ,  J)  -  (1 . 0-Y (I ,  J) ) * (QPR ( I ,  J)+1,0) 
QPR(I,  J)-PROB(I,j)*(Y(I,J)+QPR(I,j)) 

00  80  I-l.NPC 
TENP-0. 0 
00  85  J-l.NMP 
TEMP-TEMP+QPR ( I ,  J) 

00  80  J-l.NMP 

QPR  ( I.  J)  -QPR  ( I,  J) /TEMP 

CONTINUE 

FORMAT  (lX.’QC,  12,  V,  12,’)- \F9.5) 

COMPUTE  UER  BASED  ON  QUEUEING  FREQUENCIES 

00  355  J-l.NMP 
RATE (J) -1.0 
00  35  I-l.NPC 

RATE  ( J)  -  (1 . 0-QPR  { I ,  j) )  ,vRATE  ( J) 

RATE (J)-l. 0-RATE (J) 

DO  36  J-l.NMP 

UER  (UK)  -UER  ( I JK)  +RATE (J) 

CONTINUE 

XMAX-1.0>vMINPM 

PLOT  RESULTS 


CALL  PLOT (21 , XMAX.UER) 

WRITE (19, 332), NPC.NMP 

FORMAT (///, IX,’ NUMBER  OF  PROCESSORS 
11X, ’NUMBER  OF  MEMORIES  -’  ,2X,  12, //) 

DO  200  1-1,21 

WRITE (19, 331),  ALPHA (I), UER  (I) 


-\2X,I2 


STOP 

END 


FORMAT (IX,  ’  ALPHA-’ , F5. 3, 5X,  ’ UER-’  ,F8. 5) 


,/ 


PLOTTING  SUBROUTINE 
SUBROUTINE  PLOT  (LIMIT,  XMAX,VALUE) 
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C 

2 


5 


11 

59 

10 


DIMENSION  VALUE (100) 
DIMENSION  J(101) 

J (100) .  - 
WRITE  (19,2) 

FORMAT  (11X, ' . 

1 . 


•  •  •  • 


DO  10  Nl, LIMIT 


IX-100.0*VALUE(I)/XMAX+0.5 
I  LAST  — I X 


•  •  •  • 


DO  5  K-l.IX 
J (K) •’  ’ 
JUX+1)-’*’ 
J(l)-’.' 


URI  TEU9.ll) .  ( J (L )  ,L«1,  IX+1) 
FORMAT (11X, 101 (A1 ) ) 

WRI TE (19,59) 

FORMAT!/) 


CONTINUE 

RETURN 

END 


•  « 


f 
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C 

C 

C 

C 

C 

C 

C 

C 


C 

C 

1973 

1974 

1975 


C 


100 

C 

300 

C 

400 

C 


500 

C 

C 

C 


APPROXIMATE  MARKOV  CHAIN  MODEL  FOR  TP-TW+TC 


*VnVyoV*iViV*Vf**********VnViV**yf**iV*ynViViV***yfynV*AiVyrt,fyrynV**yn,nV***ynV** 


01  MENS I  ON  S(17,17), CM  (16,18) , TRANS (17, 17) 
DIMENSION  Z (17) , B (17) 

COMMON  TRANS, B.Z.NSTATE 


TYPE  1973 

FORMAT (IX,* NUMBER  OF  PROCESSORS’,/) 
ACCEPT  1974.N 

FORMAT (I) 

TYPE  1975 

FORMAT (IX,’ NUMBER  OF  MEMORIES',/) 
ACCEPT  1974, M 
XM-1 . 0*M 
K-MIN0 (N,M) 

NSTATE-K+1 

IF  (K.NE.N)GO  TO  100 
TRANS (NSTATE.l) -1.0 
1 1-1 

CONTINUE 

00  300  1-1,17 
S  ( 1 , 1 ) -0 

S(l,I)-0 

S(l,l)-1 

00  400  1-1,16 
00  400  J-1,1 

S(I+l,J+l)-1.0yfJ>(S{I,J+l)+SU,J) 


DO  500  1-1,16 
CM ( I , 1 ) - I 
00  500  J-2,1 

CM  ( I ,  J)  -CM  ( I ,  J-l )  ivl .  0yf  ( I  -J+l ) 


00  1000  II-U+l.NSTATE 

II.... INOEX  TO  THE  STATE  DURING  CURRENT  CYCLE 
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C 

c 

c 

c 


c 

c 

c 

c 

c 


300 

C 

1000 

C 


8000 

C 

C 

C 


2000 

C 

1 

C 


C 

C 

C 

C 

C 


I -I  I +N-K-1 

1 . NUMBER  OF  PROCESSORS  MAKING  NEW  REQUEST  IN  CURRENT  CYCLE 

X-XM**I 
KK-MIN0U  ,M) 

DO  900  JJ-l.KK 

JJ . NUMBER  OF  BUSY  MPS  IN  CURRENT  CYCLE 

J . NUMBER  OF  PCS  MAKING  NEU  REQUEST  DURING  NEXT  CYCLE 

Jl* • • • • INDEX  TO  STATE  AT  NEXT  CYCLE 

J-N-JJ 

Jl-J-N+NSTATE 

TRANS ( Jl , 1 1 ) -1 . 0yyCM (M, JJ) *S ( I +1 , JJ+1 )  /X 
CONTINUE 

CONTINUE 

DO  8000  I -1 .NSTATE 
TRANS (1,1) -TRANS ( I , I ) -1 . 0 

TRANS  INSTATE, I) «1.0 

B (NSTATE) -1 
CALL  GAUSS 

DO  2000  1 1 -1 1+1, NSTATE 
I «I I+N-K-l 

UER-UER+XM*  (1.0-11.0-1. 0/XM)  ,v*I )  *Z  (II) 

CONTINUE 

TYPE  l.UER 
FORMAT (1X.F10.G) 

STOP 

END 


SUBROUTINE  GAUSS 

THIS  SUBROUTINE  SOLVES  SIMULTANEOUS  LINEAR  EQUATIONS 
A*X=B 

DIMENSION  A ( 1 7 , 17) , B ( 1 7) , X (17) 

COMMON  A,B,X,N 
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INTEGER  S 
C 

S=N 

5  CONTINUE 

IF(S-l)  50,50,10 
10  CONTINUE 

IF (S.GT.2)G0  TO  105 
D«=A(1,1)*A(2,2)-A  (1 ,2)  VfA  (2, 1) 
IF(ABS(D).GT.0.0005)GO  TO  105 
TYPE  25 
GO  TO  100 

105  DO  20  I-l.S 

M-S-I+l 

IF  (ABS  (A (M, S) ) . GT.  0.  0005) GO  TO  30 
20  CONTIN!  £ 

TYPE  25 

25  FORMAT (IX, ’THE  COEFFICIENT  MATRIX  IS  SINGULAR’,/) 

GO  TO  100 
30  CONTINUE 

IF (M.EQ.S)GO  TO  40 
T-B(S) 

B(S)*=B(M) 

B (M)«T 
DO  35  J-l.S 
T-A(S.J) 

A(S,J)-A(M,J) 

A(M,J)-T 
35  CONTINUE 

40  CONTINUE 

DO  45  I-l.S-1 
<-S-I 

IF (ABS (A (S, S) )  ,GT. 0. 0005) GO  TO  42 
GO  TO  100 

42  B  (K) =B (K) -B(S)*A (K, S) /A  (S,S) 

DO  45  J=1,S-1 

A(K, J)«A(K, J)-A(K,S)*A(S,J)/A(S,S) 

45  CONTINUE 

S-S-l 
GO  TO  5 
50  CONTINUE 

DO  70  I-l.N 
SUM=B (I ) 

DO  60  J-l, 1-1 
SUM*SUM-A  ( I ,  J)  >vX  ( J) 

60  CONTINUE 

IF (ABS (A(I , I ) )  ,GT.  0. 0005) GO  TO  61 
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GO  TO  100 

61 

x  c i ) -sun/A (i,i) 

70 

CONTINUE 

100 

CONTINUE 

RETURN 

END 

■*--  — 


UM 


uuauuuuuuua 
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*VfVfVoVVnVV«'n'«Vi'nVVf>ViVi'M'fVM'f»'»V***i,»'o'oVV’M,«V*iV**i’rfnVV(Vn'f»’«’o'o,n'nV>V»'o'o,o,o'o,o,o,n’o’oV 


APPROXIMATE  MARKOV  CHAIN  MODEL  FOR  TP>TU 
PROB (TP-TU+I *TC) =BETA*ALPHA**I 

E (TP) =TU+TC*ALPHA/BETA 
ALPHA+BETA-1 


VnVVnYVhVVnVVnViVVnVVnVVnVVnVVnViViViViViWnViViVMnViVVnViVVnWnVVnVVnVfnVVnVVnVVn'nVi’n'nVVh’nV 


DIMENSION  S (17 ,17) .CM (16, 16), TRANS (17, 17) 

INTEGER  COMB (16, 16) 

DIMENSION  Z (17) ,B (17) 

INTEGER  D 

COMMON  TRANS, B.Z  NS TATE 
C 
C 

TYPE  1973 

1973  FORMAT ( IX,’ NUMBER  OF  PROCESSORS’,/) 

ACCEPT  1974, N 

1974  FORMAT (1) 

TYPE  1975 

1975  FORMAT (IX, ’NUMBER  OF  MEMORIES’,/) 

ACCEPT  1974, M 

XM-1 . 0*M 
TYPE  1976 

1976  FORMAT  (IX, ’ALPHA  :  PROB  OF  ONE  MORE  CYCLE  OF 
EXECUTION’,/) 

ACCEPT  1977, ALPHA 

1977  FORMAT F) 

BETA-1, 0-ALPHA 
NSTATE-N+1 

C 

C 

DO  300  1=1,17 
S(I,1)=0 

300  S(1,I)=0 

S (1 , 1 )  -1 

c 

DO  400  1=1,16 
DO  400  J-1,1 

400  S(I+1,J+1)=1,0*J*S(I,J+1)+S(I,J) 

C 

DO  500  1=1,16 
CM (I , 1) =1 
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DO  500  J-2, I 

500  cn ( i , j) -cm ( i , j-i ) *i . 0* ( i -j+i ) 

c 

COMBd.D-l 
DO  20  K-2,16 
C0MB(K,1)-K 
L=MIN0 (K, 16) 

DO  10  1-2, L 

10  COMB  (K,  I )  -COfIB  (K,  I  -1)*(K-I+1)/I 
20  CONTINUE 

C 
C 

DO  1000  I-l.N 

x-xm**i 

c 

c  K  IS  THE  MAX.  NO.  OF  OCCUPIED  MP’S 

C  D  IS  THE  NUMBER  OF  OCC.  MP’S 

C 

K-MIN0 (I , M) 

DO  1000  D-l.K 
C 

C  I-D  PC  ARE  LEFT  IN  MP  QUEUES 

C  NN  PC  ARE  TO  BE  REASSIGNED  TO  PC  OR  MP  QUEUES 

C  XPROB  IS  THE  PROB  OF  D  MP’S  BEING  BUSY 

XPROB-CM (M, D)*S ( I +1 ,D+1 ) /X 
900  NN-N-I+O 

TRANS  ( I  -D+l  ,1+1)  -  TRANS  ( I  -D+l ,  I  +1 )  +XPROB* A LPHA VnvNN 
IF (NN.EQ. 0)GO  TO  1000 
C 

DO  1000  NEUPC-l.NN 
C 

C  NEWPC-  NO.  OF  NEW  PC’S  TO  BE  ASSIGNED  TO  MP  QUEUES 

C  NEWMP-  TOTAL  NO.  OF  PC’S  THAT  MAKE  MP  REQ.  NEXT  CYCLE 

C 

NEWMP-I -D+NEWPC 
C 

TEMP-1 . 0*COMB  (NN,  NEWPC)  *  (BETA**NEUPC)  *  (ALPHA**  (NN-NEUPC) ) 
TRANS  (NEUMP+1 , 1  +1 )  -TRANS  (NEUMP+1 , 1  +1 )  +XPROB*TEMP 
C 

1000  CONTINUE 

C 

00  1500  I-l.NSTATE 

1500  TRANS (1,1) -TRANS (I, 2) 

00  8000  I -1 , NSTATE 
TRANS (1,1) -TRANS ( I , I ) -1 . 0 
8000  TRANS (NSTATE, I) -1.0 
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2000 

C 


B (NSTATE) =1 
CALL  GAUSS 
DO  2000  I-l.N 

UER-UER+XM*  (1 . 0-  (1 . 0-1 . 0/XH)  **1 )  *Z  ( 1  +1 ) 
CONTINUE 

TYPE  l.UER 
FORMAT (1X.F10. G) 

STOP 

END 


SUBROUTINE  GAUSS 

THIS  SUBROUTINE  SOLVES  SIMULTANEOUS  LINEAR  EQUATIONS 
A*X-B 

DIMENSION  A(17,17),B(17),X(17) 

COMMON  A.B.X.N 
INTEGER  S 

S«N 

CONTINUE 
IF(S-l)  50,50,10 
CONTINUE 

IF (S.GT.2IG0  TO  105 
D-A(1,1)*A(2,2)-A(1,2)*A(2,1) 

IF (ABS (D) ,GT. 0. 0005) GO  TO  105 
TYPE  25 
GO  TO  100 

DO  20  U1,S 

M-S-l+1 

IF (ABS (A (M,S) ) . GT.0. 0005) GO  TO  30 

CONTINUE 

TYPE  25 

FORMAT (IX, ’THE  COEFFICIENT  MATRIX  IS  SINGULAR’,/) 

GO  TO  100 
CONTINUE 

IF  (M.fc'Q.S)GO  TO  W 
T-B(S) 

B  (S)  =B  (M) 

B(M)-T 
DO  35  J-1,S 
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T-A(S, J) 

A  (S,  J) -A  (M.  J) 

A (M, J) -T 
35  CONTINUE 

40  CONTINUE 

DO  45  I-l.S-1 
K-S-I 

IF ( ABS (A (S , S ) ) .GT. 0.0005) GO  TO  42 
GO  TO  100 

42  B (K) =B(K) -B (S)*A(K,S)/A(S,S) 

DO  45  J-l.S-l 

AIK, J) “A (K, J) -A(K,S)*A(S, J)/A(S,S) 
45  CONTINUE 

S-S-l 
GO  TO  5 
50  CONTINUE 

DO  70  I-l.N 

sun-B  ( i ) 

DO  G0  J-l, 1-1 
SUM=SUI1-A  (I ,  JhvX  (J) 

G0  CONTINUE 

IF (ABS (All , I) ) .GT. 0.0005) GO  TO  61 
GO  TO  100 

61  X  (I ) -SUn/A (I , I) 

70  CONTINUE 

100  CONTINUE 

RETURN 
END 


O  <_>  C_>  O  (_>  C_>  (_> 


*************************VnV***fnV******ft****Vf*rt*>V**VfVlr**iff 


CACHE  MODEL  FOR  TP>=TU,  CONSTANT. 
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1 


5 


6 

4 

1373 

1374 

1375 


C 


10 


********************************************************* 


TYPE  1 
ACCEPT  4, TP 

FORMAT  (IX.  ’ ENTER  VALUE  OF  TP, FORMAT  F\/) 
TYPE  2 
ACCEPT  4, TF 

FORMAT (IX, 'ENTER  VALUE  OF  TF’,/) 

TYPE  5 
ACCEPT  4, TC 

FORMAT (IX, 'ENTER  VALUE  OF  TC’,/) 

TYPE  6 
ACCEPT  4, TU 

FORMAT (IX, 'ENTER  VALUE  OF  TU’,/) 

FORMAT (F) 

TYPE  1373 


FORMAT (IX, 'NUMBER  OF  PROCESSORS,  INTEGER  FORMAT’,/) 


ACCEPT  1374, N 


FORMAT (I) 


TYPE  1375 

FORMAT (IX, 'NUMBER  OF  MEMORIES’,/) 
ACCEPT  1374, M 
XN=1.0*N 


XM=1.0*M 


TA-TC-TU 
00  2000  1=0,3 
ALPHA =0. 1*1 
BETA=1. 0-ALPHA 

CONS=ALPHA* ( TP+TF ) +BETA* (TP-TU) 


PMAX=1 .0 
PMIN-0.0 
P=0. 5 

PNEU=1 . 0-CONS*  (1 . 0-  (1 . 0-P/XM)  **N)  *XM/XN/TC/BETA 

DEL=P-PNEU 

DEL=ABS (DEL) 

IF (DEL. LE. 0. 0001 ) GO  TO  1000 
I F  (PNEU . GT . P ) PM I N=P 
IF (PNEU.LT.P)PMAX=P 
P=0.5*(PMIN+PMAX) 

GO  TO  10 


J 
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1000  P«0.5vr  (PNEU+P) 

UER-1 . 0- (1 . 0-P/XM) 

uer-uer*,;ti/beta 

2000  TYPE  20.UER 

20  FORMAT  (IX, ’EXECUTION  RATE-\F8.4) 
STOP 
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MULTIPROCESSOR  SYSTEM  SIMULATOR 


This  simulator  can  be  used  for  multiprocessor  systems  with  NPC  processors 
and  NMP  memory  units  connected  by  a  single  crosspoint  switch.  Pc[i]  has  a 
probability  PROB(l,J)  of  requesting  service  from  memory  j.  The  matrix  PROB  can 
be  carefully  chosen  to  model  partially  assigned  crosspoints  and  private  caches 
e.g.  if  memory  unit  1  is  a  private  cache  dedicated  to  Pc[l],  then  PROB(l,l)« 
Prob{Pc[i]  hits  cache]  and  PR0B<K,l)-0  for  KK1. 

The  simulation  program  consists  of  four  main  parts: 

(i)  Main  Program 

(ii)  Subroutine  PC(ID) 

(iii) Subroutine  MP(ID) 

(iv)  BLOCK  DATA 


The  main  program  contains  the  scheduler,  which  calls  the  two  subroutines 
when  a  processor  or  memory  is  activated.  This  is  a  discrete  event  simulator 
where  the  two  subroutines  are  analogous  to  activities  in  SIMULA.  The  scheduling 
is  done  by  maintaining  a  doubly  linked  list  called  the  sequencing  set  or  SQS. 
Figure  B-l  shows  the  structure  of  the  SQS  array. 
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The  first  element  or  the  head  of  the  SQS  is  a  dummy  element,  whose 
successor  is  the  first  real  event  notice.  Thus,  when  the  SQS  is  empty  (i.e.  no 
event  notices)  the  first  element  of  the  SQS  points  to  itself.  The  SQS  is 

maintained  in  proper  time  order  with  the  n*xt  most  Imminent  event  as  the 
successor  of  the  head. 

Figure  B-2  depicts  a  simple  flow  chart  for  the  main  program.  The  program 
executes  in  a  loop  until  a  preset  simulation  time  is  reached.  At  this  stage  It 
jumps  out  of  the  loop  and  operates  on  the  statistics  collected  a.d  outputs 
parameters  like  the  number  of  instructions  executed,  the  execution  rate,  and  the 

queue  lengths  and  waiting  times  for  the  various  memories.  A  typical  output  is 
shown  in  fig.  B-3. 

The  processor  activity  is  characterized  by  a  subroutine  PC(ID);  the  flow 
chart  is  depicted  in  fig  B-4.  A  uniform  random  number  with  range  (0,1)  is 
compared  with  the  access  probabilities  in  the  array  PROB  to  select  a  memory.  A 
request  is  entered  into  the  queue  corresponding  to  that  memory  unit.  An  event 
notice  for  activating  the  memory  is  entered  into  the  SQS  if  necessary. 

Figure  B-5  illustrates  the  working  of  the  subroutine  MP(ID).  In  its  current 
version,  a  processor  is  selected  as  per  FIFO  discipline.  The  processor  is  then 
scheduled  to  be  activated  after  a  time  interval  equal  to  the  sum  of  the  memoi  y 
access  time(ta)  and  the  processor’s  execution  time(tp).  The  processing  time 
distribution  can  be  arbitrarily  chosen.  The  program  listed  here  has  an 
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Output  Statistics 


Figure  B-2  Flow  chart  of  the  main  program. 
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NO.  OF  UNIT  INSTR  EXEC  IN  500000  TIMEUNI TS  IS  3465 

UNIT  EXEC  RATE*  0.00G930 

UNIT  INSTRUCTIONS  EXECUTED  BY  INDIVIDUAL  PROCESSORS 
8G8 
8G9 
864 
864 

ACCESS  FREQUENCIES 

0.297235  0.38133G  0.321429 
0.303797  0.331415  0.364787 
0.303241  0.3G5741  0.331019 

0.312500  0.343750  0.343750 

******  MEMORY  UNIT  1  ****** 

NO.  OF  REFS*  1054 

MAX  Q  LENGTH*  2 
AVG.  Q  LENGTH*  0.01467 
AVG.  WAITING  TIME*  6.9573 

******  MEMORY  UNIT  2  ****** 

NO.  OF  REFS*  1232 

MAX  Q  LENGTH-  2 
AVG.  Q  LENGTH..  0.02154 
AVG.  WAITING  TIME*  8.7419 

******  MEMORY  UNI T  3  >V>VW>V>V>V 
NO.  OF  REFS-  1179 

MAX  Q  LENGTH*  2 
AVG.  Q  LENGTH*  0.02107 
AVG.  WAITING  TIME*  8.9372 

AVERAGE  VALUE*  0.02108 

AVERAGE  VALUE*  0.02464 

AVERAGE  VALUE*  0.02358 


Figure  B-3  Typical  Simulator  Output 
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exponentially  distributed  processing  time.  The  memory  reschedules  itself  at  the 
end  of  the  cycle  if  any  request  are  pending  in  its  queue. 

The  BLOCK  DATA  contains  input  parameters  for  the  simulation. 

This  simulator  could  have  be^n  written  in  SIMULA.  However,  the  scheduling 
involved  does  not  necessitate  all  the  capabilities  of  SIMULA.  Moreover,  the  easy 
access  to  FORTRAN  on  the  PDP-lO’s  time-sharing  system  was  an  important 
consideration  in  its  selection. 


Select  the  memory 
to  he  accessed. 


i 


Enter  request  into  the 
queue  of  the  selected 
memory. 


Enter  an  event  notice  for 
the  memory  at  the  end  of 
its  current  cycle. 


RETURN 


Enter  an  event  notice  for 
the  memory  at  the  current 
value  of  TIME. 


Fl~./  chart  of  the  proccocor' 
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Identify  the  first 
processor  request  in 
the  queue. 


Enter  an  event  notice  for 
“he  processor  at 
TIME=  memory  access  time 
+  processing  time. 


Is 

tlicro  tiny  other  requent 
in  the  queue? 


N 


Enter  an  »v  :nt  notice  for 
itself  at  he  end  of  this 
memory  cycle. 


Figure  B-5 


flow  chart  of  the  memory's  activity 
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C 

C 

C 


C 


21 


22 


C 

23 

C 


24 

C 

C 

1000 

701 


C 


C 


C 


PROCESSING  TIME  —  EXPONENTIAL 
THIS  PROGRAM  SIMULATES  A  MULTIPROCESSOR  WITH  A 
CROSS-POINT  BETWEEN  PROCESSORS  &  MEMORIES 
INTEGER  SQS ( 1 00 . 5) .EMPTY (99) 

rnMMnM/cnccrPRE0 ’ SCHE0 •  EVTVPE, EVID,  EVNUM ,  TOPMT 
“"”™/SDSET/  SOS  EMPTY, SUC, Pr:D,SCHEO,EVTYPE,EV!D,TOPHT 
INTEGER  TA (50) , TC (50) , TP (50) 

INTEGER  TIME, SIMTIM 

COMMON/GLOBAL/  T A ,  TC ,  TP , T I  ME , NPC , NMP , S I MT I M , N I NSTR 
DIMENSION  COUNT (50,50) , I NSTR (50) 

COMMON/MEM/COUNT, I NSTR 
DIMENSION  PROB (50,50) 

COMMON/PROC/  PROB,  I  SEEL) 

in^nSoITQ(50)  ’  TLAST  (50)  .LENGTH (50)  .MAXQL  (50) 
COMMON/STAT/TTQ, TLAST, LENGTH, MAXQL 

rnMMHMf?n(]nP (5e*  50>  .FIRST  (50) ,  NEXT (50) , NEXTAC (50) 
COMMON/Q/  QMP, FIRST, NEXT, NEXTAC 
COMMON/EXP/ 1  SO 

INITIALIZE  SQS  AND  EMPTY  LIST 
DIMENSION  RATE (50) , JMP (50) , JPC (50) 

DIMENSION  QPR (50, 50), TEMP (50) 

DA  TA  SUC , PREO , SCHED , EVT YPE , EV I  D/1 , 2 , 3 , 4 , 5/ 

DATA  TOPMT/1/ 

DO  21  1=1,99 
EMPTY (I ) =1+1 


DO  22  1=1, NMP 
NEXT  (I )  =1 
FIRST ( I ) =1 
SQS  (1, SUC) =1 
SQS  (1 , PRED) =1 

activate  ALL  PROCESSORS  AT  TIME=0 
DO  23  1=1, NPC 
CALL  INSERT (1,1,0) 

CONVERT  ACCESS  PROBS.  TO  CUMULATIVE  ACCESS  PROBS. 
DO  24  1=1, NPC 
DO  24  J=2, NMP 

PROB ( I , J) =PROB ( I , J) +PROB (I , J-1) 

THIS  IS  THE  SCHEDULER 

REMOVE  ’FIRST’  ELEMENT  IN  SQS  ANO  RESTORE  PTRS. 

I  »SQS (1, SUC) 

IF  (I  .EQ.DTYPE  701 

FORMAT (IX, ’EVENT  LIST  EMPTY’) 

J=SQS ( I , SUC) 

SQS (1 , SUC) =J 
SQS (J, PREO) =1 
UPDATE  TIME 
TIME=SQS(I .SCHEO) 

IF (TIME. GE. SIMTIM) GO  TO  2001 
EVNUM=SQS(I .EVTYPE) 

1D-SQS ( I , EVIO) 

UPDATE  EMPTY  LIST 
TOPMT=TOPMT-l 
EMPTY (TOPMT) =1 
ACTIVATE  CURRENT  EVENT 
GO  TO  (1,2)  EVNUM 
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702  FORMAT (IX, ’SCHEDULER  HAS  UNKNOWN  EVENT  TYPE’) 

STOP 

1  CALL  PC (ID) 

GO  TO  1000 

2  CALL  MP(ID) 

GO  TO  1000 

C  OUTPUT  STATISTICS 

2001  CONTINUE 

00  2002  1=1, NMP 

2002  N I  NSTR=N 1 NSTR+I NSTR ( I ) 

TIME-SIMT1M 

WRI TE (19,41) , T1ME.N1NSTR 

41  FORMAT (IX, ’NO.  OF  UNIT  I NSTR  cXEC  IN’ ,2X, 1 12, 2X, ’ TIME 
1UNITS  IS’ ,2X, 112) 

UER-N I NSTR/T I MC*1 . 0 
WRI TE (19, 40) , UER 
TYPE  40, UER 

40  FORMATdX, ’UNIT  EXEC  RATE-’ ,F10. B) 

C.  FIND  NO.  OF  I NSTR  EXEC  BY  EACH  PROCESSOR 

DO  31  I-l.NPC 
DO  31  J-l.NMP 

31  JPC  (I )  -JPC (I  J+COUNT (I ,  J) 

C  FIND  NO.  OF  ACCESSES  TO  EACH  MEMORY 

DO  32  I-l.NMP 
DO  32  J-l.NPC 

32  JMP ( I ) -JMP ( I ) +COUNT ( J , I ) 

WRITE (19,42) , (JPC(I) , I-l.NPC) 

42  FORMAT (/, IX, ’UNIT  INSTRUCTIONS  EXECUTED  BY  INDIVIDUAL 
1  PROCESSORS’, /,50(1X, 110,/)) 

C  FIND  ACCESS  FREQUENCIES  FOR  THIS  SIMULATION  RUN 

DO  33  I-l.NPC 
DO  33  J-l.NMP 

33  PROBU  ,  J) -COUNT  (I,  J) /JPC  (I ) 

WRITE (19,443) 

443  FORMATdX, ’ACCESS  FREQUENCIES’,/) 

DO  333  I-l.NPC 

333  WRITE  (19,43),  (PROBd  ,  J) ,  J-l.NMP) 

43  FORMATdX,  8(F8.6,2X)) 

C  COMPUTE  AVG  WAIT  TIME, AVG  Q  LENGTH 

DO  34  I-l.NMP 
AVQL-1 . 0*TTQ(I ) /TIME 
AVUT-1 . 0-,vTTQ  ( I )  /  JMP  ( I ) 

34  WRITE  (19,44),  I ,  JMPd  )  .MAXQL  (I ) ,  AVQL,  AVWT 

44  FORMAT (/, IX, *******  MEMORY  UNIT  ’,12,’  ******’,/, 

11X, ’NO.  OF  REFS-’, 112,/, IX, ’MAX  Q  LENGTH-’  ,12,/, 

1 IX , ’ AVG.  Q  LENGTH-’, F8. 5,/, IX, ’AVG. 

2WAI TING  TIME-’ ,F10.4) 

N-TIME/10 
DO  801  1D-1.NMP 
AV-INSTR (ID) /T1ME*10. 0 
801  WRI  TE(19,202) ,AV 

202  FORMAT (/, IX, ’AVERAGE  VALUE-’ ,F10.5) 

STOP 

END 


SUBROUTINE  PC (ID) 

DIMENSION  PROB(50,50) 

COMMON/PROC/  PROS, I  SEED 

INTEGER  QMP (50, 50)  .FIRST (50) , NEXT (50) , NEXTAC (50) 
COMMON/Q/  OMP, FIRST, NEXT, NEXTAC 
INTEGER  TA (50) , TC (50) , TP (50) 

INTEGER  TlflE.SIMTIM 

COMMON/GLOBAL /  TA ,  TC ,  TP , T I  ME , NPC , NMP, SI MT I M ,  N I NSTR 
I NTEGER  TTQ (50) , TLAST (50) , LENGTH (50) ,  MAXQL  (50) 
COMMON/STAT/TTQ, TLAST, LENGTH, MAXQL 
PROS  (I . J) =  PROBABILITY (PC (I)  ACCESSES  PC (J) ) 

GENERATE  A  UNIFORM  RV  &  DETERMINE  MEMORY  TO  BE  ACCESSED 
RV-UNIRANUSEEO) 

RV-RV+1.0/NPC*IO 
I F (RY. GT . 1 . 0) RV-RV -1 
DO  10  1-1, NMP 

I F (RV. LE . PROB ( ID, I ) ) GO  TO  100 

CONTINUE 

GO  TO  1 

IMP-I 

INCREMENT  Q  MEASURE  COUNTERS 

TTQ ( I MP) -TTQ ( I MP) + (T I ME-TLAST  ( I MP) ) *LENGTH  ( I MP) 

LENGTH (IMP) -LENGTH ( I MP) +1 

I F  (LENGTH ( I MP) . GT . MAXQL (IMP)) MAXQL  ( I MP) -LENGTH (IMP) 
TLAST (IMP) -TIME 

PUT  IN  REQUEST  IN  QUEUE  FOR  CHOSEN  MEMORY 

QMP (IMP, NEXT ( I MP ) ) - 1 0 

NEXT (IMP) -NEXT ( IMP) +1 

IF (NEXT  (IMP). GT. NPC) NEXT (IMP) -1 

CHECK  IF  MEMORY  UNIT  HAS  OTHER  REQUESTS  QUEUED 

I F (LENGTH ( I MP) . GT . 1 ) RETURN 

CHECK  IF  MEMORY  IS  READY  TO  GRANT  THIS  ACCESS  NOW 
I F ( T I  ME . GT . NEXT  AC ( I MP ) ) GO  TO  200 
CALL  INSERT (2, IMP, NEXTAC (IMP)) 

RETURN 

CALL  INSERT (2, IMP, TIME) 

RETURN 

END 


SUBROUTINE  MP(ID) 

I  NTEGER  QMP (50, 50) , F I RST (50) ,  NEXT  (50) , NEXTAC (50) 
COMMON/Q/  QMP, FIRST, NEXT, NEXTAC 
INTEGER  TA (50) , TC (50) , TP (50) 

INTEGER  TIME.SIMTIM 

COMMON/GLOBAL/  TA,  TC,  TP,  TIME, NPC,  NMP,  SIMTIM,  NINSTR 
DIMENSION  COUNT (50,50) , INSTR (50) 

COMMON/MEM/COUNT, INSTR 

I  NTEGER  TTQ (50) , TLAST (50) , LENGTH (50) , MAXQL  (50) 
COMMON/STAT/TTQ, TLAST, LENGTH, MAXQL 
COMMON/EXP/ 1  SEED 
I F (LENGTH ( ID) . EQ. 0) TYPE  703 

FORMAT (IX, ’MEMORY  ACTIVATED  WITHDUT  SCHEDULED  REQUEST’) 
INCREMENT  Q  MEASURE  COUNTERS 
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1NSTR ( ID) -INSTR (IDJ+l 

TTQ  (10)  -TTQ(IO)  +  (T IME-TLAST (10) ) ^LENGTH (ID) 
LENGTH (10) -LENGTH (10) -1 
HAST  (ID) -TIME 

C  IDENTIFY  PC  TO  BE  SERVICED  AS  PER  FIFO  STRATEGY 

1PC-QMP ( 10, F IRST ( ID) ) 

FIRST (10) “FIRST (ID) +1 
COUNT (I PC, 10) -COUNT (I PC,  I0)+1 
IF (FIRST (10) .GT.NPC) FIRST (10) -1 
9374  RV-UNIRAN  (ISEEO) 

TPROC-450-TP  ( I  PC)  ivALOG  (RV) 

IF (TPROC.GT. 10*TP (IPC) ) GO  TO  9374 
JTIME-TIME+TA(IQ)+TPROC 
C  SCHEOULE  PC 

CALL  INSERT (1, IPC, JTIME) 

NEXTAC  (ID) -TIME+TC (ID) 

IF  (LENGTH ( I O1 .EQ.0) RETURN 
CALL  INSERT (2,10, NEXTAC ( ID) ) 

RETURN 

END 


SUBROUTINE  INSERT (JTYPt, IO.TSCHEO) 

C  THIS  ROUTINE  INSERTS  ELEMENT  I  AFTER  K  URT  TIME 

INTEGER  TSCHED 
INTEGER  SQS (100, 5) ,EMPTY(99) 

I NTEGER  SUC , PREO, SCHEO, EVTYPE , EV 1 0, EVNUM, TOPMT 
COMMON/SQSE T /  SQS , EMPTY , SUC , PRED , SCHEO , EVTYPE , EV I D ,  TOPMT 
300  IF  (TOPMT. EQ.0)  TYPE  704 
704  FORMAT  (IX, ’SCHEDULER  OVERFLOW’) 

I F  (SQS (1 , SUC) . EQ. 1 )  GO  TO  320 
J-SQS (1 , SUC) 

310  CONTINUE 

IF  (SQS (J, SCHEO). GT.TSCHEO)  GO  TO  330 
IF  (SQS  (J, SUC)  .EQ.DGO  TO  340 
J-SQS (J, SUC) 

GO  TO  313 
320  K=1 

GO  TO  100 

330  K-SQS ( J, PREO) 

GO  TO  100 
340  K=J 

100  I -EMPTY (TOPMT) 

SOS (I, SUC) -SQS (K, SUC) 

J-SQS (K, SUC) 

SQS (I ,PREO)-K 
SQS  (K, SUC) -I 
SQS  (J, PREO) =1 
SQS  (I, SCHEO) -TSCHED 
SQS (I, EVTYPE) -JTYPE 
SQS (I , EV 10) =10 
TOPMT-TOPMT+1 
RETURN 
END 


Pago 

REAL  FUNCTION  UNIRAN(JSEEO) 

JSEEO«JSEED*31 4 1592621 
JSEED=JSEEO+7261067113 
IF (JSEEO) 1,2,2 

1  JSEED=JSEED+34359738367+l 

2  TEMP=JSEED 

UN  I  RAN-  TEf1P*0 . 291 0380346E-1 0 

RETURN 

ENO 


